MySQL High-availability--PXC Introduction

Last Update:2018-07-27 Source: Internet

Author: User

Tags failover joins percona

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PXC Introduction:

The Galera product is a galera cluster way to improve the high availability clustering solution for MySQL. Galera cluster is a MySQL cluster with integrated Galera plugins. Galera replication is a MySQL data synchronization scheme provided by Codership, which is highly available, easy to expand, and can realize data synchronous replication and read/write between multiple MySQL nodes, which can guarantee high availability of database service and strong data consistency.

PXC belongs to a nearly perfect MySQL high-availability cluster solution, compared to the more traditional master-slave replication mode-based cluster architecture MHA and Mm+keepalived,galera cluster most prominent feature is to solve the long-maligned data replication latency problem, Can basically achieve real-time synchronization. And the relationship between nodes and nodes is reciprocal. itself Galera cluster is also a multi-master architecture. Galera cluster is most concerned about the consistency of the data, the behavior of the things, either on all nodes, or do not execute, its implementation mechanism determines that it treats the consistency of the behavior is very strict, which can also be very perfect to ensure the MySQL cluster data consistency;

The Galera cluster package has two, although the name is different, but the essence is the same, the use of Galera cluster. A MySQL founder in his new mariadb on the implementation of MARIADB cluster; one is the famous MySQL service and tool provider Percona implementation of Percona XTRADB cluster, abbreviated PXC

To build a PXC architecture requires at least 3 MySQL instances to form a cluster, three instances are not master-slave mode, but their main, so the three are the equivalent relationship, no subordinate, this is called multi-master architecture. When the client writes to and reads the data, the connection instance is the same. When the data is read the same, after any instance is written, the cluster itself synchronizes the newly written data to the other instances, which does not share any data and is a highly redundant architecture.

The function of--:galera cluster is 7 points, as follows:

①: Multi-Master Architecture: A true multi-point read-write cluster that reads and writes data at any time is up-to-date;

②: Synchronous replication: Data synchronization between different nodes of the cluster, no delay, after the database is hung, the data will not be lost;

③: Concurrent replication: Supports parallel execution when the node is in the Apply data, with better performance

④: Failover: Because multi-point writes are supported, failover is easy in the event of a database failure

⑤: Hot plug: During the service, if the database hangs, as long as the monitoring program found fast enough, not service time will be very small, during node failure, the node itself has little impact on the cluster;

⑥: Automatic node cloning: In the new node or downtime maintenance, incremental data or basic data do not need manual backup to provide, Galera cluster will automatically pull the online node data, the cluster will eventually become consistent;

⑦: Transparent to the application: the maintenance of the cluster, the application is transparent, almost not feel;

--PXC principle:

The following 4 port numbers are most commonly used by PXC:

3306-The port number of the database external service.

4444-Port of the request SST (SST refers to the transfer of a full-volume file of a database backup. ）

4567-a port number for communication between group members

4568-For transmission of IST (one increment relative to SST)

PXC Operation Flow:

First, the client initiates a transaction, which executes locally and initiates a commit to the transaction after the execution is completed. It is necessary to broadcast the resulting copy write set before committing it, and then obtain a global transaction ID number and transfer it to the other node. By merging the data, it is found that there are no conflicting data, perform apply_cd and COMMIT_CB actions, or you will need to cancel the operation of this transaction. After the current server node passes validation, the commit operation is performed, and an OK is returned, and the rollback is performed if the validation does not pass. Of course, in the production of at least 3 nodes of the cluster environment, if one of the nodes is not verified through, there is a data conflict, then the way to do is to say that there are inconsistent nodes kicked out of the cluster environment, and it will execute the shutdown command, automatic shutdown.

Advantages of PXC:

①: Achieve high availability and strong data consistency for MySQL DB cluster architecture.

②: Complete the real multi-node read-write cluster scheme.

③: Improves the traditional master-slave replication delay problem, basically achieves the real-time synchronization.

④: The newly added nodes can be deployed automatically without the need for manual backup, which is easy to maintain.

⑤: Because it is a multi-node write, database failover is easy.

Disadvantages of PXC:

①: The newly added nodes are expensive and need to replicate the full data. The overhead of using SST is too high.

②: Any update transactions require global validation to pass before they are executed on each node library. Cluster performance is limited by the worst-performing nodes, often called short-board effects.

③: Because of the need to ensure the consistency of data, so in multi-node concurrent write, lock collision problem is more serious.

④: There is a write enlargement problem, and there will be some action on all nodes.

⑤: Only tables that support the InnoDB storage engine.

⑥: No table-level locking, execution of DDL statement operations will lock the entire cluster, and will not kill (the use of the OSC operation, the online DDL)

⑦: All tables must contain a primary key, otherwise the operation data will be error.

PXC construction of the attention point:

First of all to standardize the number of nodes in the cluster, the entire cluster node points control at least 3, up to 8 range. A minimum of 3 nodes is required to prevent the occurrence of a brain fissure, because this behavior occurs only at two nodes. The sign of the brain crack phenomenon is that the input of any command, the return result is unkown command, the node in the cluster, because of the new node join or failure, synchronization failure and other state switching.

--Node state change phase:

Open: The node starts successfully and attempts to connect to the cluster.

Primary: The node is already in the cluster, and when the new node joins, select the state that will occur when the donor is synchronized with the data.

Joiner: The state at which the node waits to receive a synchronized file.

Joined: the node completes the work of data synchronization and tries to maintain the same progress as the cluster.

Synced: The state of the node's normal service delivery, indicating that it has been synchronized and consistent with the cluster's progress.

Doner: The State of the node when it provides full data for the newly joined node.

Note: The Doner node is the contributor to the data, and if a new node joins the cluster, it also requires a large amount of data in the SST transmission, it is possible to drag down the performance of the entire cluster. Therefore, in a production environment, if the amount of data is small, you can also use the full amount of SST transmission, but if the amount of data is very large, it is not recommended to use this method. Consider establishing a master-slave relationship before you join a cluster.

PXC has two kinds of node data transmission mode: One is called SST full-volume transmission, the other is called ist incremental transmission.

SST transmission is: Xtrabackup, Mysqldump and rsync three methods. One method of incremental transmission is xtrabackup. However, when the general data volume is not large in production environment, the whole amount of SST can be transmitted, but only the Xtrabackup method is implemented.

A particularly important module in PXC is the Gcache. Its core function is that each node caches the most current write set. If a new node is added, the increment of the new data can be passed to the new node without the need to use the SST method. This allows the nodes to join the cluster more quickly. The parameters involved are as follows:

Gcache.size: Represents the size of the increment information used to cache write sets. Its default size is 128MB, which is set by the Wsrep_provider_options parameter. It is recommended to adjust to the 2GB-4GB range, enough space to cache more incremental information.

Gcache.mem_size: Represents the size of the memory cache in the Gcache, moderate resizing can improve the performance of the entire cluster.

Gcache.page_size: It can be understood that if memory is not sufficient (Gcache is insufficient), write sets are written directly to the disk file.

--:PXC Mode of operation:

The working mode of Galera is that a node writes a transaction, it broadcasts to other nodes, and this so-called other node also includes itself. Also say that oneself send out of affairs, oneself also will receive, just after receive and produce Gtid, it is simply ignored, and will not go to apply once.

concurrency control mechanism for--:galera:

Concurrency control is mainly done in the interface Galera_pre_commit, this interface is one of the most important interfaces of Galera, which realizes the most important copy, authentication logic. Currently, the concurrency control included in this interface has the following points:

①: Data replication:

In the current version of Galera, the sending of the write set data is broadcast through the ASIO asynchronous way. This send is serial, is a critical section, because before each send, the logic also needs shards, and after each send completes, need to wait for a gtid value, so in order to ensure the consistency of data, this send operation needs serial;

②: Write Set Validation:

Requires all the gtid into the processing area must be sequential, because Gtid is produced sequentially, so on the basis of the order, the same time must be only one transaction can be processed, white is the serial;

This level of concurrency control management operations mainly have validation operations, so that the verification is serial;

③: Write set apply

④: Transaction Commit

This level of concurrency control mechanism, the default is 3, the recommendation is also 3, is the serial submission, so that both in the main library or from the library, all the nodes produced by the Binlog are exactly the same;

3. Galera Interface:

---galera_init:

The purpose of this interface is to initialize a Galera node, which is the first Wsrep interface called by a PXC node, initialized when the server is started, and initializes all required parameters and environment variables. (such as: Cluster name, instance address, need for this interface to do binlog replication, etc.)

---galera_connect:

This interface is the second invocation of an interface. The purpose of this interface is to join the current node in the cluster. Before joining the cluster, the function WSREP_VIEW_HANDLER_CB is called to determine whether the data of the new join node and the cluster are synchronized.

---galera_recv:

The purpose of this interface is to block the receiving of other nodes and the data sent by this node in this function, and call the copy apply function to perform the copy operation. (This interface can actually exist in parallel.) It corresponds to how many threads the parameter wsrep_slave_threads has, and how many GALERA_RECV calls it has.

---galera_pre_commit:

This interface is one of the most important interfaces of Galera. Its role consists of two parts, the first is the current specified transaction write set broadcast to the entire cluster node, and then verify that if the validation is successful, then the processing rights to the upper layer, continue to do the database transaction commit operation; This interface is called when the database transaction commits, and when this interface is called, it must be completed by the local transaction. ；

---Galera_replay_trx:

The function and use of this interface, is in the verification process, because of the database lock conflict, the current operation by other threads Autonomous County galera_abort_pre_com_mit, resulting in the current thread is forced to abort, but because the write set has been replicated to other nodes, so this node this transaction must be completed. Through this interface, the write set of this transaction is made once apply, so it is called replay;

---galera_append_key:

This interface is called Galera validation, the object being verified is actually a write set, and the content that constitutes the write set is actually done through this interface;

---galera_append_data:

This interface is the current Binlog content generated by the transaction, that is, key after the validation through, using data in the slave node execution, you can do data synchronization;

---galera_post_commit:

This interface is used to actually commit the transaction. This interface consists of 4 functions: Updating the value of the status parameter wsrep_last_committed, indicating that the current transaction has actually been committed, and updating the value of the parameter wsrep_local_commits, indicating that a transaction was committed locally and successfully Check the current validation write set buffer is not possible to do purge operation;

---galera_to_execute_start:

This interface is specifically designed to handle the execution of DDL statements;

---galera_to_execute_end:

This interface is actually the same as the Galera_post_commit function, in pairs appear, is to deal with different statements set, mainly to get out of the commit critical section, so that other transactions continue to commit;

MySQL High-availability--PXC Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More