Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,

Source: Internet
Author: User

Postgres-XL: PostgreSQL-based open-source scalable Database Cluster,
Postgres-XL: PostgreSQL-based open-source scalable Database Cluster


In the past year, IOE has become more and more popular in the industry. Many traditional enterprises have also placed their "IOE" plans on their desktops. I have always wondered why these non-Internet companies (such as banks) are motivated to do this? The empty slogans such as "self-control" and "rejuvenation of national science and technology" on the tall are ignored. Where is the real motivation? "Security", "cost", "Internet architecture" ...... and so on, the only thing that seems reliable is the Technical Advancement of Internet architecture. What about nonsense? What is the trend of the times you have taken care! To put it bluntly, some time ago, we were also considering what kind of database could be "tailistic", which could replace Oracle database as the data storage of business systems. This database system must be open-source, support SQL, and support ACID, and the workload for porting business applications is small. Then, it is found that PostgreSQL meets the requirements. From the perspective of application transplantation, the workload is much less than that of MySQL.
Recently, on Weibo, the MySQL party began to dispute with the PostgreSQL party. When talking about the workload of migrating Oracle to PostgreSQL, M fans shouted: "In fact, Oracle does not necessarily need to rewrite applications on a large scale, it depends entirely on the degree of dependence on the proprietary features of the database. Generally, for large-scale Internet applications, considering scalability, very complex features are not used, changing a database is far less difficult than general enterprise applications. This is the part of the rewrite ". What I want to say: Brother! Have you ever seen SQL-Embedded C program files? Have you ever seen many applications that use PL/SQL stored procedures? Many old systems write business programs like this. Just for the moment, MySQL is not powerful in this regard. Restructuring the business system is the responsibility of a large number of enterprises, not all of which can afford it.

Yesterday I read the public version of "operator de-O analysis", the keynote speech delivered by Zhejiang mobile at the China Database Technology Conference. I felt that the de-O key points and difficulties mentioned in the speech were in place, of course, from the perspective of traditional enterprises, does not represent the perspective of BAT and other Internet companies.
Let's go back and talk about replacing PostgreSQL with Oracle.
Foreign companies that specifically use and expand PostgreSQL and provide alternative Oracle solution services, such as EnterpriseDB :"Incluisedb is the leading worldwide provider of your S software and services that enable your ISES to reduce their reliance on costly proprietary solutions and slash their database spend by 80 percent or more.
With powerful performance and security enhancements for PostgreSQL, sophisticated management tools for global deployments and database compatibility, EnterpriseDB software supports both mission and non-mission critical enterprise applications. more than 2,500 extends ISES, governments and other organizations worldwide use extends isedb software, support, training and professional services to integrate open source software into their existing data infrastructures.

Based in Bedford, MA, EnterpriseDB is backed by strategic private investors.
"
In addition to the Internet also saw a successful case of the Japanese Telecommunications Company (NTT) using PostgreSQL to o ppt: https://www.pgcon.org/2011/schedule/attachments/203_NTT_Case_307.pdf, but the new problem has come, can PostgreSQL scale horizontally to cope with the database operation pressure of high concurrency and large transaction volume systems? As a result, I continued to work on the Internet and found a seemingly good open-source implementation over the past two days: ipvs-XL. (Whether the specific performance can meet the requirements is unknown yet)
Now, we will briefly introduce ipvs-XL using the online materials that have been sorted out.

Postgres-XL features

  • Open Source: www.postgres-xl.org
    Postgres-XL is called Postgres eXtensible Lattice. It is a product of TransLattice and its acquired database technology company-StormDB. It is open source after the core part of StormDB is reconstructed.
    Open-source protocols use the loose "Mozilla Public License" License to allow mixed use of open-source code and closed-source code.
  • Complete ACID support
  • Horizontally scalable Relational Database Service (RDBMS)
    • Supports OLAP applications and adopts the MPP (Massively Parallel Processing: Large-scale Parallel Processing system) architecture mode.
    • Supports OLTP applications and scalable read/write performance (note that the top priority is OLAP !!!)
    • Cluster-level ACID features
    • Multi-tenant security
    • It can also be used for Distributed Key-Value storage.
  • Transaction processing and data analysis and processing hybrid Database
  • Supports a wide range of SQL statement types, such as associating subqueries
  • Supports most PostgreSQL SQL statements
  • Distributed Multi-version Concurrency Control (MVCC: Multi-version Concurrency Control)
  • JSON and XML formats are supported.
Functions missing from ipvs-XL
  • Built-in high availability Mechanism
    • High Availability with external mechanisms, such as Corosync/Pacemaker
    • Space for improving functions in the future
  • Simplicity of adding node/re-sharding data (re-shard)
    • The table is locked during redistribution.
    • You can use pre-shard to create multiple data nodes on the same physical server. Each node stores one data shard.
      When data is re-distributed, move some data nodes out.
  • Some foreign keys and uniqueness constraints

Ipvs-XL Architecture

  • Based on the open-source project S-XC
    • XL has added MPP, allowing direct communication between data nodes, exchanging complex cross-node Association queries for related data information, and reducing the workload of the Coordinator.
  • Coordinator)
    • Database connection point of the application
    • Analyze query statements and generate execution plans
  • Multiple Data nodes (DataNode)
    • Actual data storage
    • Data is automatically scattered to all data nodes in the cluster.
    • Local Query
    • A query is concurrently queried on all related nodes.
  • Global Transaction Manager (GTM: Global Transaction Manager)
    • Provides the consistency view between transactions
    • Deploy GTM Proxy instances to improve performance

Coordinator)
  • Processes client network connections, which are Database Access Points
  • Analyze the query statement, generate the execution plan, and pass the plan to the data node for actual execution.
  • Perform final processing on the intermediate query result set returned by the Data Node
  • Two-phase commit of management transactions (2 PC)
  • Store Global directory information




Data Node (DataNode)

  • Store tables and index data
  • Only the Coordinator connects to the Data Node
  • Execute the query that is passed under the Coordinator
  • One-to-one communication connection can be established between two data nodes to exchange information about distributed table Association queries.
Global Transaction Manager (GTM)

  • Process required MVCC tasks
    • Transaction IDs Transaction ID
    • Snapshots data snapshot, used by MVCC
  • Manage global data values
    • Timestamps Timestamp
    • Sequences sequence object
  • There is only one GTM node in the cluster, and there will be spof issues. Solution: configure the StranBy Hot Standby node to ensure high availability

  • Deploy GTM Proxy to solve possible GTM performance bottlenecks

GTM Proxy
  • Coordinator and DataNode)TogetherRun
  • The backend (Coordinator and data node) replaces GTM and interacts with it directly. It acts as the intermediary between the backend and GTM.
  • Group GTM requests and submit multiple requests to GTM at a time.
  • Obtain the range of transaction ids (XIDs)
  • Get Data snapshots
  • For example, each of the 10 processes requests a transaction id.
    • Each of them is connected to the local GTM Proxy.
    • GTM Proxy sends a request to GTM and requests 10 xids at a time
    • GTM locks the procarray data structure and assigns 10 xids.
    • GTM returns the XID range
    • GTM unmutex lock

Postgres-XL Data Distribution
Postgres-XL data distribution has two modes: Replicated Table and Distributed Table ). (These two terms are very familiar. It seems that MySQL-based open-source column-store MPP database InfoBright is also the two methods. The same is true for a domestic MPP database with a basic concept similar to InfoBright .)
Create table my_table (...) Distribute byhash (col) | MODULO (col) | ROUNDROBIN | REPLICATION [to node (nodename [, nodename…])]



Replicated Table)
  • Benefit for read-only and read-write tables with few
  • Dimensions used for data warehouse
  • If the coordinator and data nodes are deployed on the same server one by one, Local Data Reading is performed to reduce network transmission.
  • It is not applicable to tables with frequent writes.
  • Copy each record to all the data nodes in the cluster.

Distributed Table)
  • Useful for tables with write frequency
  • Useful fact tables for Data Warehouses
  • Each record is stored on only one data node.
  • Available sharding policy methods
    • Hash
    • Round Robin
    • Modulo

Ipvs-XL availability


  • No spof
    • The Global Transaction Manager adopts the hot standby mode (is it not a single point of failure if there is hot standby ?)
    • Load Balancing among multiple Coordinator
    • Data nodes use stream replication to copy data to the standby Node
  • HoweverAll of this is manual at present... (mainly about stream replication? Hand-made, let's talk about it ~)

Ipvs-XL Performance

Transaction Processing type (OLTP Transaction) Performance

  • Testing uses a TPC-W (DBT-1) benchmarking model for e-commerce applications.
  • The coordination layer adds 30% of the overhead: on a single node (CPU 4 cores), only 70% of the performance of PostgreSQL is compared to simply using PostgreSQL.
  • When the cluster Scale is expanded to 10 nodes, the performance should be improved 7 times theoretically compared with the single-node PostgreSQL, which is 6-times in fact.
  • As the number of cluster nodes increases, the number of opened transactions, snapshot space usage, and visibility check all increase.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.