Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,

Last Update:2015-04-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,
Postgres-XL: PostgreSQL-based open-source scalable Database Cluster

In the past year, IOE has become more and more popular in the industry. Many traditional enterprises have also placed their "IOE" plans on their desktops. I have always wondered why these non-Internet companies (such as banks) are motivated to do this? The empty slogans such as "self-control" and "rejuvenation of national science and technology" on the tall are ignored. Where is the real motivation? "Security", "cost", "Internet architecture" ...... and so on, the only thing that seems reliable is the Technical Advancement of Internet architecture. What about nonsense? What is the trend of the times you have taken care! To put it bluntly, some time ago, we were also considering what kind of database could be "tailistic", which could replace Oracle database as the data storage of business systems. This database system must be open-source, support SQL, and support ACID, and the workload for porting business applications is small. Then, it is found that PostgreSQL meets the requirements. From the perspective of application transplantation, the workload is much less than that of MySQL.
Recently, on Weibo, the MySQL party began to dispute with the PostgreSQL party. When talking about the workload of migrating Oracle to PostgreSQL, M fans shouted: "In fact, Oracle does not necessarily need to rewrite applications on a large scale, it depends entirely on the degree of dependence on the proprietary features of the database. Generally, for large-scale Internet applications, considering scalability, very complex features are not used, changing a database is far less difficult than general enterprise applications. This is the part of the rewrite ". What I want to say: Brother! Have you ever seen SQL-Embedded C program files? Have you ever seen many applications that use PL/SQL stored procedures? Many old systems write business programs like this. Just for the moment, MySQL is not powerful in this regard. Restructuring the business system is the responsibility of a large number of enterprises, not all of which can afford it.

Yesterday I read the public version of "operator de-O analysis", the keynote speech delivered by Zhejiang mobile at the China Database Technology Conference. I felt that the de-O key points and difficulties mentioned in the speech were in place, of course, from the perspective of traditional enterprises, does not represent the perspective of BAT and other Internet companies.
Let's go back and talk about replacing PostgreSQL with Oracle.
Foreign companies that specifically use and expand PostgreSQL and provide alternative Oracle solution services, such as EnterpriseDB :"Incluisedb is the leading worldwide provider of your S software and services that enable your ISES to reduce their reliance on costly proprietary solutions and slash their database spend by 80 percent or more.
With powerful performance and security enhancements for PostgreSQL, sophisticated management tools for global deployments and database compatibility, EnterpriseDB software supports both mission and non-mission critical enterprise applications. more than 2,500 extends ISES, governments and other organizations worldwide use extends isedb software, support, training and professional services to integrate open source software into their existing data infrastructures.

Based in Bedford, MA, EnterpriseDB is backed by strategic private investors."
In addition to the Internet also saw a successful case of the Japanese Telecommunications Company (NTT) using PostgreSQL to o ppt: https://www.pgcon.org/2011/schedule/attachments/203_NTT_Case_307.pdf, but the new problem has come, can PostgreSQL scale horizontally to cope with the database operation pressure of high concurrency and large transaction volume systems? As a result, I continued to work on the Internet and found a seemingly good open-source implementation over the past two days: ipvs-XL. (Whether the specific performance can meet the requirements is unknown yet)
Now, we will briefly introduce ipvs-XL using the online materials that have been sorted out.

Postgres-XL features

Open Source: www.postgres-xl.org
Postgres-XL is called Postgres eXtensible Lattice. It is a product of TransLattice and its acquired database technology company-StormDB. It is open source after the core part of StormDB is reconstructed.
Open-source protocols use the loose "Mozilla Public License" License to allow mixed use of open-source code and closed-source code.
Complete ACID support
Horizontally scalable Relational Database Service (RDBMS)
- Supports OLAP applications and adopts the MPP (Massively Parallel Processing: Large-scale Parallel Processing system) architecture mode.
- Supports OLTP applications and scalable read/write performance (note that the top priority is OLAP !!!)
- Cluster-level ACID features
- Multi-tenant security
- It can also be used for Distributed Key-Value storage.
Transaction processing and data analysis and processing hybrid Database
Supports a wide range of SQL statement types, such as associating subqueries
Supports most PostgreSQL SQL statements
Distributed Multi-version Concurrency Control (MVCC: Multi-version Concurrency Control)
JSON and XML formats are supported.

Functions missing from ipvs-XL

Built-in high availability Mechanism
- High Availability with external mechanisms, such as Corosync/Pacemaker
- Space for improving functions in the future
Simplicity of adding node/re-sharding data (re-shard)
- The table is locked during redistribution.
- You can use pre-shard to create multiple data nodes on the same physical server. Each node stores one data shard.
  When data is re-distributed, move some data nodes out.
Some foreign keys and uniqueness constraints

Ipvs-XL Architecture

Based on the open-source project S-XC
- XL has added MPP, allowing direct communication between data nodes, exchanging complex cross-node Association queries for related data information, and reducing the workload of the Coordinator.
Coordinator)
- Database connection point of the application
- Analyze query statements and generate execution plans
Multiple Data nodes (DataNode)
- Actual data storage
- Data is automatically scattered to all data nodes in the cluster.
- Local Query
- A query is concurrently queried on all related nodes.
Global Transaction Manager (GTM: Global Transaction Manager)
- Provides the consistency view between transactions
- Deploy GTM Proxy instances to improve performance

Coordinator)

Processes client network connections, which are Database Access Points
Analyze the query statement, generate the execution plan, and pass the plan to the data node for actual execution.
Perform final processing on the intermediate query result set returned by the Data Node
Two-phase commit of management transactions (2 PC)
Store Global directory information

Data Node (DataNode)

Store tables and index data
Only the Coordinator connects to the Data Node
Execute the query that is passed under the Coordinator
One-to-one communication connection can be established between two data nodes to exchange information about distributed table Association queries.

Global Transaction Manager (GTM)

Process required MVCC tasks
- Transaction IDs Transaction ID
- Snapshots data snapshot, used by MVCC
Manage global data values
- Timestamps Timestamp
- Sequences sequence object
There is only one GTM node in the cluster, and there will be spof issues. Solution: configure the StranBy Hot Standby node to ensure high availability

Deploy GTM Proxy to solve possible GTM performance bottlenecks

GTM Proxy

Coordinator and DataNode)TogetherRun
The backend (Coordinator and data node) replaces GTM and interacts with it directly. It acts as the intermediary between the backend and GTM.
Group GTM requests and submit multiple requests to GTM at a time.
Obtain the range of transaction ids (XIDs)
Get Data snapshots
For example, each of the 10 processes requests a transaction id.
- Each of them is connected to the local GTM Proxy.
- GTM Proxy sends a request to GTM and requests 10 xids at a time
- GTM locks the procarray data structure and assigns 10 xids.
- GTM returns the XID range
- GTM unmutex lock

Postgres-XL Data Distribution
Postgres-XL data distribution has two modes: Replicated Table and Distributed Table ). (These two terms are very familiar. It seems that MySQL-based open-source column-store MPP database InfoBright is also the two methods. The same is true for a domestic MPP database with a basic concept similar to InfoBright .)

Create table my_table (...) Distribute byhash (col) | MODULO (col) | ROUNDROBIN | REPLICATION [to node (nodename [, nodename…])]

Replicated Table)

Benefit for read-only and read-write tables with few
Dimensions used for data warehouse
If the coordinator and data nodes are deployed on the same server one by one, Local Data Reading is performed to reduce network transmission.
It is not applicable to tables with frequent writes.
Copy each record to all the data nodes in the cluster.

Distributed Table)

Useful for tables with write frequency
Useful fact tables for Data Warehouses
Each record is stored on only one data node.
Available sharding policy methods
- Hash
- Round Robin
- Modulo

Ipvs-XL availability

No spof
- The Global Transaction Manager adopts the hot standby mode (is it not a single point of failure if there is hot standby ?)
- Load Balancing among multiple Coordinator
- Data nodes use stream replication to copy data to the standby Node
HoweverAll of this is manual at present... (mainly about stream replication? Hand-made, let's talk about it ~)

Ipvs-XL Performance

Transaction Processing type (OLTP Transaction) Performance

Testing uses a TPC-W (DBT-1) benchmarking model for e-commerce applications.
The coordination layer adds 30% of the overhead: on a single node (CPU 4 cores), only 70% of the performance of PostgreSQL is compared to simply using PostgreSQL.
When the cluster Scale is expanded to 10 nodes, the performance should be improved 7 times theoretically compared with the single-node PostgreSQL, which is 6-times in fact.
As the number of cluster nodes increases, the number of opened transactions, snapshot space usage, and visibility check all increase.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support