Greenplum distributed Framework Structure

Source: Internet
Author: User

Greenplum distributed Framework Structure
1. Basic Architecture

Greenplum (GPDB) is a typical Shared-Nothing distributed database system. GPDB has a central control node (Master) to coordinate the entire system and run multiple database instances (Segment) in the distributed framework ). The Master is the access entry of the GPDB system. It is responsible for handling client connections, SQL commands, and other Segment tasks in the system. The Segment is responsible for managing and processing user data. Each Segment is actually composed of multiple independent PostgreSQL instances, which are distributed on different physical hosts and work collaboratively.

Master node and subnode

In GPDB, data is distributed to all segments through a complex HASH algorithm or randomly split into records without overlap. Only the Master completes direct interaction with users and client programs. Therefore, for users, using the GPDB system is like using a single database.

Global System Catalog is stored on the Master, but no user data is stored. User data is only stored on the Segment. The Master is responsible for client authentication, SQL command entry processing, allocation of work loads between segments, integration of Segment processing results, and presentation of final results to client programs.

User tables and corresponding indexes are distributed on each Segment in GPDB, and each Segment only stores the part of data belonging to the current node. You cannot directly skip the Master's access to the Segment, but can only access the entire system through the Master. In the hardware configuration environment recommended by GPDB, each valid CPU check should be a Segment. For example, a physical host is equipped with two dual-core CPUs, then, each host is configured with four Primary instances (Segment Primary ).

Network Connection

Interconnect is an important component of GPDB. When a user executes a query, each Segment must be processed accordingly. Therefore, efficient transmission of control information and data is required between physical hosts. The role of the network layer is to achieve communication between physical hosts, data transmission, and backup. By default, the network layer uses UDP. GPDB performs packet verification for the UDP Protocol. Its reliability is the same as that of the TCP protocol, but its performance and scalability are far better than that of the TCP protocol.

2. query execution mechanism

After the system starts, the user connects to the Master host through a client program (such as psql) and submits query statements. GP creates multiple DB processes to process queries. Query Dispatcher/QD on the Master ). QD is responsible for creating and distributing query plans and summarizing and presenting final results. On a Segment, a processing process is called a Query executor/QE ). QE is responsible for completing part of its processing work and exchanging intermediate results with other processing processes.

Query plan generation and distribution

The query is received and processed by the Master (QD identity ). QD creates the original query syntax tree based on the defined lexical and syntax rules. In the query analysis stage, QD converts the original syntax tree to the query tree. Then, enter the query rewriting stage. QD converts the query tree according to the predefined rules in the system. QD finally calls the optimizer to accept the rewritten query tree, and completes query logic optimization and physical Optimization Based on the query tree. GPDB is a cost-based optimization strategy: evaluates several execution plans to find the most efficient one. However, the query optimizer must consider the entire cluster globally and the overhead of moving data between nodes in each candidate execution plan. So far, QD creates a parallel or targeted query plan (determined by the query statement ). The Master then distributes the query plan to the relevant Segment for execution. Each Segment is only responsible for processing the local data operations. Most operations, such as table scanning, association, aggregation, and sorting, are executed concurrently on the Segment. Each specific part is executed independently of other segments. (once the execution plan is determined, for example, if there is a join, the distributed join operation is performed separately on each node, and the local machine only joins the local data ).

Query execution

GPDB adopts the Shared-Nothing architecture. To achieve maximum parallel processing, the query plan is split when data needs to be moved between nodes, A query is divided into multiple slice. Each slice involves different processing tasks. That is, perform step-by-step operations, move data, and then perform the next step. During the query execution, each Segment will create multiple worker s working processes based on the slice division on the query plan and execute the query in parallel. The process corresponding to each slice only processes jobs of its own part, and these jobs are only executed on this Segment. Slice is a tree structure, which forms the entire query plan. The same slice processing work on the query plan corresponding to different segments is called a cluster (gang ). After the work on the current gang is completed, the data will be passed up until the query plan is completed. Inter-Segment communication involves the network layer components of GPDB (Interconnect ).

QE enables an independent process for each slice and executes multiple operations in the process. Each step represents a specific DB operation, such as table scanning, association, aggregation, and sorting. The execution operator of the process corresponding to a single slice on the Segment is called from top down, and data is transferred from bottom up.

Unlike typical DB operations, GPDB has a unique OPERATOR: motion ). A move operation involves moving data between segments during query and processing. Motion is divided into broadcast and redistribute motion. It is precisely the motion operator that divides the query plan into slice. The process corresponding to the previous slice reads the data broadcast or redistribution of each slice Process on the next layer, and then computes the data.

Like PostgreSQL, Greenplum uses tuples to obtain and process data. We extract a single tuple as needed. After processing this tuple, the system will retrieve the next one that meets the condition until all the tuples that meet the condition are obtained. The motion operations between slice send and receive data in tuples, and the Lower slice buffer is used to form the production and consumption model, but the whole query flow is not blocked. Finally, the query results of each Segment are also transmitted to the Master through motion. After the Master completes the final processing, the query results are returned.

3. Fault Tolerance mechanism node image and fault tolerance

GPDB supports configuring image nodes for Segment. A single Primary Segment and the corresponding Mirror Segment are configured on different physical hosts. The same physical host can load multiple Primary Segment and Mirror Segment corresponding to different instances at the same time. Data between Primary Segment and Mirror Segment is backed up synchronously at the file level. Mirror Segment is not directly involved in database transactions and control operations.

When Primary Segment is not accessible, the system automatically switches to its corresponding Mirror Segment. In this case, Mirror Segment replaces Primary Segment. As long as the remaining available Segment can ensure data integrity, the GPDB system may still use Primary/Mirror to maintain the overall availability of the system when some Segment or physical hosts are down.

The specific switching process is that whenever the Master finds that it cannot connect to a certain Primary Segment (FTS system), it will mark the failure status in the GPDB system log table, activate/wake up the corresponding Mirror Segment to replace the original Primary Segment and continue the subsequent work. A failed Primary Segment can be switched back when the system is in the running state after it is restored.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.