? POSTGRES-XL: PostgreSQL-based open source extensible DB cluster

Source: Internet
Author: User


? POSTGRES-XL: PostgreSQL-based open source extensible DB cluster


The recent year the industry to the "IoE" more calls louder, many traditional enterprises have to go to the "IoE" program on the desktop. I always wonder what motivates these non-internet companies, such as banks, to do such things. Tall on the "self-control", "Revitalization of National Science and Technology" and other empty slogan first not to tube, the real power where? "Security", "cost", "Internet Architecture" ... And so on, and so on, the only thing that seems reliable is the advanced technology of Internet architecture. What a lot of nonsense, the general trend you Tube it!  at the same time, the previous period also considered what can be "take doctrine" database, can replace the Oracle database as a business system of data storage. The database system must be open source, support SQL, support acid, and the business application porting is less work. box to box, and finally found that PostgreSQL meets the requirements, from the application porting the workload is much smaller than the use of MySQL.
on recent Weibo, the MySQL party began to clash with PostgreSQL, talking about the hours of Oracle porting to PostgreSQL, and M's fans shouted: " in fact, go to O does not have to be large-scale rewrite application Ah, completely depends on the database proprietary characteristics of the degree, in general, for larger Internet applications, because the scale of scalability, do not use a very complex features, the exchange of a database is far less difficult than the general enterprise application. Even the part of the rewrite . " I want to say: Elder brother! Have you ever seen a C program file embedded in SQL?  Ever seen a lot of applications that use PL/SQL stored procedures? Many old systems write business programs that way. Just MySQL in this respect temporarily not to force, restructuring the business system that amount of responsibility, Alexander, not what the enterprise can bear.

yesterday read the Zhejiang Mobile in China Database technology conference keynote speech "operator to O analysis" of the public version, I feel that the key points and difficulties are in place, of course, is standing in the perspective of traditional enterprises, do not represent bat and other Internet companies on the high angle.
again, go back and talk about PostgreSQL instead of O.
There are also companies that specialize in using and expanding PostgreSQL to provide alternative Oracle solution services, such as: Enterprisedb: " Enterprisedb is the leading worldwide provider of Postgres Software and services that enable enterprises to reduce their reliance on costly proprietary solutions and slash their dat Abase spend by percent or more. 
With powerful performance and security enhancements for PostgreSQL, sophisticated management tools for global deployments and database compatibility, Enterprisedb software supports both mission and non-mission critical enterprise applications. More than 2,500 enterprises, Governments and other organizations worldwide use ENTERPRISEDB software, support, training an D Professional Services to integrate open source software into their existing data infrastructures.

Based in Bedford, MA, Enterprisedb are backed by strategic private investors.
"
also on the internet there is a Japanese telecommunications company (NTT) using PostgreSQL to O successful case of the Ppt:https://www.pgcon.org/2011/schedule/attachments/203_ntt_case _307.pdf But the new question comes again, can postgresql scale horizontally to cope with the database operating pressure of high-concurrency large-volume systems? So continue to work online, these two days to find a look good open-source implementation: POSTGRES-XL. (whether the specific performance can meet the requirements, there is no actual test is unknown)
now, with a little finishing on the online information, a brief introduction of the next POSTGRES-XL.

? POSTGRES-XL Functional Characteristics
  • Open Source: www.postgres-xl.org
  • POSTGRES-XL is all called Postgres extensible Lattice, is Translattice company and its acquisition of database technology company –stormdb Products, is the Stormdb core part of the re-open source.
    The open source protocol uses a loose "Mozilla public License" license that allows open source code to be mixed with closed sources.
  • Full acid support
  • Scale-out relational database (RDBMS)
    • Supports OLAP applications using the MPP (Massively Parallel Processing: massively parallel processing system) architecture model
    • Supports OLTP applications with scalable read and write performance (note that the first bit is olap!!! )
    • ACID properties at the cluster level
    • Multi-tenancy security
    • can also be used as a distributed key-value storage
  • Transaction processing and data analysis processing mixed-type database
  • Supports rich SQL statement types, such as: correlated subqueries
  • Support for most PostgreSQL SQL statements
  • Distributed multi-versioning concurrency control (mvcc:multi-version Concurrency control)
  • Supports JSON and XML formats
? POSTGRES-XL Missing features
    • Built-in high-availability mechanism
      • Use external mechanisms to achieve high likelihood, such as: Corosync/pacemaker
      • Space for future function enhancement
    • Increased ease of node/re-sharding data (Re-shard)
      • Lock table during data redistribution (redistribution)
      • It can be solved by pre-shard (Pre-shard), establishing multiple data nodes on the same physical server, each node storing one data shard.
        When the data is re-distributed, some data nodes can be moved out
    • Some foreign keys, uniqueness constraint functions

?POSTGRES-XL Architecture

    • Based on open source project Postgres-xc
      • XL adds MPP to allow direct communication between data nodes, exchanging complex cross-node correlated query related data information, and reducing coordinator load.
    • Multiple Coordinators (Coordinator)
      • Database connection points for applications
      • Parse query statements, generate execution plan
    • Multiple data nodes (DataNode)
      • Real-World data storage
      • Data is automatically scattered across the data nodes in the cluster
      • Execute Query locally
      • A query that queries on all related nodes in parallel
    • Global transaction manager (Gtm:global Transaction Manager)
      • Provides a consistent view between transactions
      • Deploy the GTM proxy instance to improve performance

Coordinator (Coordinator)
    • Processing client network connection, is the access point of the database
    • Parses a query statement, generates an execution plan, and passes the plan to the data node for actual execution
    • Perform final processing on the query intermediate result set returned by the data node
    • Management Transaction Two phase commit (2PC)
    • Store Global Catalog Information




Data node (DataNode)

    • Storing table and index data
    • Only the coordinator connects to the data node
    • Query executed by the Coordinator
    • Two data nodes can establish a one-to-two communication connection, exchange information related to Distributed Table Association query
      Global transaction manager (GTM)

    • handling required MVCC tasks
      • transaction IDs   transaction ID
      • snapshots           data snapshot, MVCC using
    • managing Global data values
      • timestamps       timestamp
      • sequences         Sequence object
    • The complete group has only one GTM node, There will be a single point of failure. Solution: Configure the Stranby hot standby node to ensure high availability

    • Resolve potential GTM Performance bottlenecks by deploying GTM proxies
              
    GTM Proxy
      
  • with the coordinator ( Coordinator ) and Data nodes (DataNode) to run together
  • The backend (Coordinator, data node) replaces the GTM with it, interacting directly with it as a middleman between the back end and the GTM
  • The request for the GTM is grouped together, and multiple requests are submitted once to the GTM
  • Get transaction IDs (XIDs) scope
  • Get a snapshot of a data
  • For example: 10 processes requesting a transaction ID, respectively
    • Each of them is connected to a local GTM Proxy
    • GTM Proxy send request to GTM, request 10 XID at a time
    • GTM Locks The Procarray data structure and assigns 10 XID
    • GTM Returns XID Range
    • GTM Unlock Process Mutex

POSTGRES-XL Data Distribution
There are two modes of POSTGRES-XL data distribution: Copy table (replicated table), distribution table (distributed tables). (These two nouns are very familiar, looks like the MySQL-based open source Columnstore MPP database Infobright is also the same way, some basic concept and infobright similar to the high-quality MPP database. )
CREATE TABLE my_table (...) Distribute by HASH (col) | MODULO (col) | Roundrobin | REPLICATION [to NODE (Nodename[,nodename ...])

            

Copying tables (replicated table)
    • Benefits for read-only and read-write-less tables
    • Dimension tables that are sometimes beneficial to the data Warehouse
    • If the coordinator and the data node are deployed to the same server on a one-to-two, local data reads, reducing network transfers
    • Serious non-applicability to frequently written tables
    • Each row of records is copied to all data nodes in the cluster, one copy per node

Distribution tables (distributed table)
    • Table for Write frequency
    • Fact table for the Data Warehouse
    • Each row of records exists in only one data node
    • Available Shard Policy methods
      • Hash
      • Round Robin
      • Modulo

POSTGRES-XL Availability


    • There is no single point of failure
      • The global transaction manager uses hot standby (is it not a single point of failure?) )
      • Load balancing between multiple coordinators
      • Data nodes use streaming replication to replicate data to the standby node
    • But, however , all of this is now handmade ..... (mainly about streaming replication?) Handmade, speak a hair ah ~)

POSTGRES-XL Performance

transaction-processing (OLTP Transaction) Performance

    • Test Using for e-commerce applications tpc-w (DBT-1) benchmark test model.
    • The coordination layer adds 30% overhead: On a single node (CPU4 core), only 70% of PostgreSQL performance is compared to simply using PostgreSQL directly.
    • When the cluster size expands to 10 nodes, it should theoretically get 7 times times the performance improvement compared with the single node PostgreSQL, actually reaching 6-6.4 times times.
    • As the number of cluster nodes increases, the number of open transactions, snapshot space occupancy, and visibility checks increases.


? POSTGRES-XL: PostgreSQL-based open source extensible DB cluster


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.