Research and analysis of MySQL Group replication

Source: Internet
Author: User
Tags failover savepoint uuid

Three modes of MySQL replication

MySQL currently has three replication modes: asynchronous mode, semi-synchronous mode, and group copy mode, first understand how the three modes work.

1. MySQL Asynchronous Replication (asynchronous replication)

Asynchronous replication is the oldest and most currently used replication mode for MySQL, and asynchronous replication provides a simple master-slave replication method that consists of a master repository (master) and a standby (one, or more), where the main library executes and commits the transaction, after which (hence, it is called async). These transactions are re-executed from the library (based on statement) or change data content (row-based), and the master library does not detect its synchronization from the library. In the case of high server load and heavy service pressure, the master-slave delay has been criticized. The work flow diagram is as follows:


650) this.width=650; "src=" http://s1.51cto.com/wyfs02/M00/8D/0F/wKiom1iETPDRfhGXAAAsLZQQ7d0234.jpg "title=" 1.jpg " alt= "Wkiom1ietpdrfhgxaaaslzqq7d0234.jpg"/>

2, MySQL semisynchronous Replication (semi-synchronous replication)

MySQL5.5 version on the basis of one-step synchronization, in the form of plug-ins to achieve a variant of the synchronization scheme, called the semi-synchronous (Semi-sync replication). This plugin adds a synchronous process on the source asynchronous replication: Notifies the main library when a change to the main library is received from the library (that is, a transaction). There are two types of operations on the main library: After receiving this notification, the commit transaction is received, and the session is released after acceptance. Both of these approaches are determined by the specific configuration on the main library. When the main library does not receive the change notification time-out from the library, the semi-synchronous replication automatically switches to asynchronous synchronization, which greatly guarantees the consistency of the data (at least one from the library), but decreases in performance, especially in the case of network instability, and switching back and forth between the semi-synchronous and synchronous, affecting the normal business. The workflow diagram is as follows:

650) this.width=650; "src=" http://s1.51cto.com/wyfs02/M00/8D/0C/wKioL1iETQmgVfrqAAAvqf06-U8978.jpg "title=" 2.jpg " alt= "Wkiol1ietqmgvfrqaaavqf06-u8978.jpg"/>

3. Group Replication (copy)

Whether asynchronous or semi-synchronous replication, is one of the main under a slave or multiple slave mode, under high concurrency under high load, there is a delay, if the main node is abnormal, then there will be inconsistent data, data can be lost in the financial database is not tolerated. In this case, there is a need for a pattern to address these problems. In the MySQL5.7.17 version, with these expectations, the new replication mode group is duplicated and GA is generated (the test data in this article is based on MySQL5.7.17).

The Workflow flowchart for group replication is as follows:

650) this.width=650; "src=" http://s1.51cto.com/wyfs02/M01/8D/0F/wKiom1iETR7CHwkBAAA4JSHNy8w560.jpg "title=" 3.jpg " alt= "Wkiom1ietr7chwkbaaa4jshny8w560.jpg"/>

Second, the working principle of group replication

MySQL group replication is a MySQL plug-in that builds on existing MySQL replication infrastructure, leverages binary logging, row-based logging, and global transaction identifiers. It integrates the current MySQL framework, such as performance mode, plug-ins, and service infrastructure.

Group Replication is based on the Distributed consistency algorithm (variant of the Paxos Protocol), a group that allows some nodes to hang up, so long as the majority of nodes are still alive and communication is not a problem, the group can still provide services to the outside, It is a technology that is used in fault-tolerant systems. Group Replication (replication groups) are made up of multiple servers (nodes) that can communicate with each other. At the communication layer, Group replication implements a series of mechanisms, such as atomic messages (atomic message delivery) and full-order messages (total ordering of messages). These atomization, abstract mechanisms provide strong support for more advanced database replication scenarios. MySQL Group replication is based on these technologies and concepts to achieve a multi-master full-update replication protocol. In short, a group replication is a set of nodes, each of which can perform transactions independently, while read and write transactions are reconciled after the other nodes in the group commit. Therefore, when a transaction is ready to commit, an atomic broadcast is automatically made within the group informing the other nodes what has changed/what transaction has been performed. This kind of atomic broadcasting means that the transaction is kept in the same order on each node. This means that each node receives the same transaction log in the same order, so each node repeats the transaction logs in the same order, and eventually the entire group remains in a completely consistent state. However, there is a possibility of resource contention between transactions executed on different nodes. This behavior is easily seen on two different concurrent transactions. If you have two concurrent transactions on different nodes and you update the same row of data, resource contention will occur. In this case, the group replication determines that the first committed transaction is a valid transaction, which is replayed throughout the group, and then the committed transaction is either interrupted directly or rolled back, and discarded at the end. Therefore, this is also a non-shared replication scheme, where each node holds a complete copy of the data.

From the principle of its work, it can be seen that the Group replication consistency algorithm based on the Paxos protocol verifies that the transaction execution is in conflict, then executes the transaction sequentially, achieves the final data consistency, also means that some nodes can have a delay. can set multi-master simultaneous write and single-master write, by setting the Group_replication_single_primary_mode to control whether it is multi-master or single-master, the official recommendation of single-master write, allow delay, but too large delay, will trigger the current limit rule (configurable), The entire cluster becomes slow and performance is compromised.

Third, the program structure of the group replication

At the bottom of MySQL, Gr adds an additional API layer to achieve the required functionality. On the program structure, GRAPI is divided into three main parts:

The 1:capture tracks the context of the currently executing transaction.

2:applier performs a remote transaction transfer to the local database.

3:recovery is responsible for node recovery in distributed environment, as well as related data back-up, failure processing and so on.

Below these main API tiers, there is a unified replication protocol logic processing layer, which is primarily a unified application layer of various calls. In the lower level, it is a more general-purpose distributed communication layer, in the invocation of convenience, distributed communication has been used to provide the use of the api,api below, is the Paxos implementation of the Distributed communication protocol component, which together with other nodes in the cluster to form a virtual conceptualization of the distributed cluster.

650) this.width=650; "src=" http://s2.51cto.com/wyfs02/M01/8D/0C/wKioL1iETXfRl0y0AAB_BrKMu_I739.jpg "title=" 4.jpg " alt= "Wkiol1ietxfrl0y0aab_brkmu_i739.jpg"/>

Iv. message Compression (msg Compression)

This compression is mainly referred to as MySQL Bin-log compression, the compression algorithm used is LZ4. When network bandwidth is a bottleneck, message compression can provide up to 30-40% throughput improvements at the group communication level, which is especially important in groups with large network transmission pressures. LZ4 can support multi-threaded environment, get higher compression and decompression speed. The following are the compression and decompression conditions of the compression algorithm LZ4:

650) this.width=650; "src=" http://s1.51cto.com/wyfs02/M02/8D/0C/wKioL1iETbbDfuR3AABWGWloRVM019.jpg "title=" 5.jpg " alt= "Wkiol1ietbbdfur3aabwgwlorvm019.jpg"/>

Compression occurs at the group communication engine level, before the data is handed over to the group communication thread, so it occurs in the context of the MySQL user session thread. A transactional payload can be compressed before it is sent to a group, and is decompressed when it is received. Compression is conditional and depends on the configured thresholds. Compression is enabled by default, and it does not require that all server nodes in the group have a compression mechanism enabled, and when a message is received, the member examines the message envelope to verify that it is compressed, and if necessary, the member to decompress the transaction and then pass it to the upper layer.

Compression is enabled by default, with a threshold of 1000000 bytes (1MB). The compression threshold, in bytes, can be set to a value greater than the default. In this case, only the transaction with a payload greater than the threshold value is compressed. The following is an example of how to set a compression threshold.

STOP group_replication; SET GLOBAL group_replication_compression_threshold= 2097152; START group_replication;

This sets the compression threshold to 2MB. If a transaction generates a copy message that has more than 2MB of valid content, such as a binary log transaction entry that is greater than 2MB, it is compressed. The Disable compression setting threshold is 0. Note: Modifying this threshold requires a restart of the group replication.

The message compression flowchart is as follows:

650) this.width=650; "src=" http://s2.51cto.com/wyfs02/M02/8D/0F/wKiom1iEThqTq3V3AAB7UnvL09o470.jpg "title=" 6.jpg " alt= "Wkiom1iethqtq3v3aab7unvl09o470.jpg"/>

V. Requirements and limitations of group replication 1, limitations and requirements

1. All data involved must be within the table of the InnoDB storage engine.

2. All tables must have a clear primary key definition.

3. The network address only supports IPV4.

4. Requires a low latency, high bandwidth network.

5. The current cluster limit allows up to 9 nodes.

6. Binlog must be enabled.

7. The Binlog format must be in row format.

8. The Gtid mode must be turned on.

9. Copying related information must be stored using the table.

10. The transaction Write collection (Transaction write set extraction) must be open. (This is currently a conflict with savepoint, which is why mysqldump cannot back up the GR instance)

One. Log slave updates must be turned on.

Binlog's checksum is not currently supported.

13. SavePoint cannot be used because of interference in transaction write collection.

The SERIALIZABLE isolation level is not currently supported.

15. For the same object, on different instances of the cluster, executing the DDL (even conflicting DDL) in parallel is feasible, but leads to errors in data consistency, and the current phase does not support the simultaneous execution of the DDL of the same object at multiple nodes.

16. FOREIGN KEY Cascade constraint operation The current implementation is not fully supported and is not recommended for use.

2. Related configuration of group replication

Depending on the requirements and limitations of group replication, the following settings configure replication according to MySQL group replication requirements:

server_id = 1
Gtid_mode = On
Enforce_gtid_consistency = On
Master_info_repository = TABLE
Relay_log_info_repository = TABLE
Binlog_checksum = NONE
Log_slave_updates = On
Log_bin = Binlog
Binlog_format = ROW

At this point the My.cnf file ensures that the server is configured and instructs the replication infrastructure to instantiate a given configuration. The following sections configure the group replication settings for the server. The specific parameters are relatively simple, do not repeat here, see the official note:

Transaction_write_set_extraction = XXHASH64
Loose-group_replication_group_name = "Aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
Loose-group_replication_start_on_boot = Off
loose-group_replication_local_address = "127.0.0.1:24901"
Loose-group_replication_group_seeds = "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
Loose-group_replication_bootstrap_group = Off

The specific group replication installation deployment is relatively simple, both on-line and official instructions are explained, here does not elaborate installation deployment this piece.

Vi. multiple and single master modes for group replication (multi-primary or Single-primary mode)

Group replication is divided into multiple and single-master modes, the default is single-master mode, and is also the official recommended group replication mode. You cannot use both modes in a single cluster, such as one configured in multi-master mode and the other in single-master mode. To switch between modes, you need to restart the cluster with a different operation configuration. Regardless of the deployment pattern, group replication does not handle client failover, which must be handled by the application itself, the connector, or the middleware framework (such as proxies or routers).

1. Single main mode

In this mode, the group has a single primary instance that is set to read-write mode, the primary node is usually the first server to resolve the group, the other nodes in the group are automatically set to read-only mode (that is, super-read-only), and all other joined nodes are automatically recognized by the master node and set to self-read-only.

In single-master mode, some checks that are deployed in multi-host mode are disabled because the system enforces only one write node at a time. For example, it is allowed to make changes to a table with a cascading foreign key, which is not allowed in multi-master mode. In the case of a primary node failure, the automatic election mechanism selects the next master node. Select the next master node by sorting the remaining nodes by dictionary order (using their UUID) and selecting the first node in the list. If the master node is removed from the group, the selection is performed and the new master node is selected from the remaining nodes in the group, which sorts the node UUID in dictionary order and selects the first one to execute. Once the new master node is selected, the other nodes are set to the slave node, from which the node is read-only. Such as:

650) this.width=650; "src=" http://s1.51cto.com/wyfs02/M00/8D/0C/wKioL1iETwOiW4a4AABQXins6sE273.jpg "title=" 7.jpg " alt= "Wkiol1ietwoiw4a4aabqxins6se273.jpg"/>

2. Multi-Master Mode

In multi-master mode, there is no single main mode concept, nor is there an election procedure, because no node plays any special role. When you join a group, all servers are set to read-write mode.

When deployed in multi-master mode, statements are checked to ensure that they are compatible with the schema. The following checks are made when deploying group replication in multi-master mode:

1: If a transaction is executed under the Serializable isolation level, its submission will fail when it synchronizes with the group.

2: If a transaction is executed against a foreign key with a cascading constraint, the transaction cannot be committed while synchronizing with the group.

These checks can be disabled by setting the option Group_replication_enforce_update_everywhere_checks to false. When deployed in single primary mode, this option must be set to FALSE. Such as:

650) this.width=650; "src=" http://s5.51cto.com/wyfs02/M01/8D/0F/wKiom1iETx-zQ7eRAABWNL0h7r0294.jpg "title=" 8.jpg " alt= "Wkiom1ietx-zq7eraabwnl0h7r0294.jpg"/>

Vii. operation and maintenance related issues 1, failover issues

Currently MySQL does not officially publish connection group replication dedicated clients (such as MONGODB connection replication set of clients), in the actual application if a failure, the client needs to handle. For single-master mode, if the primary node fails, the client needs to determine who the new master node is, and then switch the write to the new master, basically the same as the master-slave switchover of the current asynchronous synchronization, and the new master node is automatically generated by the cluster, is not controllable; multi-master mode requires node availability checking at the client. Automatically use other available nodes when one of the write nodes is not available.

In the actual production, the combination of two sets of replication mode failover, you can use the multi-master mode, specify one of the nodes as the primary node, the other nodes are placed as read-only nodes, so that the main node failure, the new master node can be controlled.

2, big business support issues

The current version tests concurrency for Big Data operations and DDL operations, killing large transactions, and having the chance to make the cluster unusable; in INSERT INTO ... select......limit ... This big transaction support is not good, may cause the cluster to use, the multi-master mode DDL operation needs all nodes in the cluster to be online state to execute, in error and recovering state has the chance to cause the cluster to block, serious when the cluster is not available.

3, Backup problems

When performing a database backup on one of the nodes of a group replication cluster, regardless of the use of mysqldump (this cannot use the--single-transaction parameter, The mysqldump backup cluster data is not recommended in production, or the QPS using Xtrabackup drops by 40%, and the backup node basically stops reading and writing. When you test a backup file to import data, the multi-master mode is slower than the single-master mode. It is recommended to use Group replication + asynchronous replication for database backups on asynchronous replicated slave nodes.

4. Binary Log Removal issues

Because the group replication synchronization or is based on the binary log to synchronize, clean up a node bin-log, you must determine whether the log file is still in use, if in use, then absolutely cannot delete, if deleted, then the entire cluster direct error.

5, synchronization delay problem

Currently the MySQL5.7.17 version does not have the ability to visually view the synchronization delay of a node or how much delay, regardless of the time or number of things, this open MySQL debug mode, you can get to the node delay transaction situation.

The delay of group replication has an impact on the cluster, and once a delay occurs (the default is 25,000 transactions delayed), the flow control is initiated, each cycle can attenuate the current 10% until the cluster is unavailable (but the cluster node status is online), and the single node slows down the entire cluster.

Each node in the cluster validates and applies the transactions submitted by that group, and statistics about the checksum application process are useful for understanding how the application queue grows, how many conflicts have been found, how many transactions have been checked, where transactions have been committed, and so on. Table Performance_schema.replication_group_member_stats provides information about the transaction authentication process, but no deferred information. The relevant fields are explained as follows:

Field

Describe

Channel_name

The name of the group replication channel.

member_id

Represents the member server UUID that is currently connected to. Each node in the group has a different value, which is a unique key because it is unique to each member.

Count_transactions_in_queue

The number of transactions in the queue waiting for the conflict detection check. Once the conflicts are checked and they pass the check, they will be queued for application.

Count_transactions_checked

Indicates the number of transactions that have been checked for conflicts.

count_conflicts_detected

Represents the number of transactions not checked by conflict detection.

Count_transactions_validating

Represents the current size of the conflict Detection database (a validated database per transaction).

Transactions_committed_all_members

Represents a transaction that has been successfully committed on all members of the current view. This is updated at a fixed time interval.

Last_conflict_free_transaction

Displays the transaction identifier for the last non-conflict check.


6. Data consistency issues

Whether it is more write or single-write, is not strong consistency, are allowed to delay, he after verifying the transaction is not a conflict after the current broadcast to the various nodes and determine the individual nodes to enter the next thing after the collision detection, at this time each node just get the execution sequence of all transactions, to ensure that the final order of the transaction execution, In order to ensure the final consistency of the data, but the same moment is not strong consistency.

7, node fault brain crack problem

The more nodes the more performance loss, the more suitable three nodes. Node failure may have a brain fissure and other problems: such as 5 nodes distributed in two rooms, machine room network broken down into two parts, 2 clusters of the machine room is not available, 3 nodes of the availability, and three nodes of the computer room network problems, at this time if you want to make two nodes of the computer room, need to re-cluster the two nodes, Three nodes will not be able to recover to two nodes, one of the three nodes down, the other two normal nodes are available, the failure node restart is not joined to the cluster, at this time the node with a single instance can read and write, there will be a brain fissure.

8. Network Latency issues

The TC command is used during testing to simulate network latency:

TC Qdisc Add dev eth0 root netem delay 50ms 10ms Increase network latency 50ms,10ms about floating

TC Qdisc del Dev eth0 root netem delay 50ms 10ms Delete network delay

Tested Network Latency Comparison group replication MySQL's QPS: Network latency Settings 50ms and Normal, QPS lowers by at least 1/3, or even 1/2, and network latency has a significant impact on performance. Here are the test cases:

650) this.width=650; "src=" http://s2.51cto.com/wyfs02/M02/8D/0C/wKioL1iET6CBGu73AAMC_wLtuOI521.jpg "title=" 4.jpg " alt= "Wkiol1iet6cbgu73aamc_wltuoi521.jpg"/>

9. Elasticity expansion Problem

The MySQL official website mentions the elastic auto-scaling of group replication, which is not realistic in production because of actual testing. The elastic expansion that can be used for production requires a new cluster, the data in the cluster is fully synchronized by the cluster, but because the group replication is synchronized based on the binary log, it is not possible to completely retain the entire binary log in the production, and the new node needs to back up the full-volume data of the cluster first. Then, according to the synchronization location to follow the transaction to achieve consistent data after the node state online status, in fact, and before the asynchronous synchronous building master-slave. And the official tip is that if the delay in recovery is too large, it may not be possible to reach the most recent data position correctly.

10, client connectivity issues

In the official note about fault handling there is a sentence: Group replication does not handle client failover, it must be handled by the application itself, the connector or the middleware framework (such as an agent or router). Officials have repeatedly stressed that MySQL group replication provides a highly available, highly resilient, and reliable MySQL service, so does the official provide a set of client components similar to the MongoDB replica set to support that?

The current solution is to switch to asynchronous replication, either by using a domain name switch or by implementing a highly available set of client connections. But for the time being the most efficient is to combine their own business, modify the group replication fault processing source code, when the detection of a write node failure, combined with their own domain name switch to deal with. But this requires the ability of the DBA to develop the source code, relative requirements are relatively high.

11. Find the primary node IP problem

In single-master mode, the IP address of the main library cannot be obtained intuitively, and the UUID of the master node can be obtained by using the following command:

Mysql> SELECT variable_value from performance_schema.global_status WHERE variable_name = ' group_replication_primary _member ';

+ -------------------------------------- +

| Variable_value |

+ -------------------------------------- +

| 69e1a3b8-8397-11e6-8e67-bf68cbc061a4 |

+ -------------------------------------- +

1 rows (0, 00 seconds)

Using the SELECT * from Performance_schema.replication_group_members, you can view the member_host to which the UUID corresponds, and member_host refers to the host name, You need to specify Report-host as the IP address in the MySQL configuration file, so that you can correlate the two tables to the IP address of the main library.

650) this.width=650; "src=" http://s2.51cto.com/wyfs02/M00/8D/0F/wKiom1iET_6x3v9HAAEaSRdAjs4738.jpg "title=" 5.jpg " alt= "Wkiom1iet_6x3v9haaeasrdajs4738.jpg"/>

Viii. Summary

From the test situation, to the big business support is not enough, operation and maintenance management is not friendly, related group replication of the matching monitoring, client and other imperfect, there are some problems can be circumvented and curve to solve, a part of the need for source level support; In performance and pxc contrast, to better than PXC, This is different from the respective copy protocol.

MySQL group replication provides a highly available, highly resilient, and reliable MySQL service designed to create a financial-grade MySQL cluster. In the case of ignoring the network delay, it is easy to realize the multi-live and off-site call to write the library, this is the business more expect. Group replication is a future development trend of MySQL, I believe in the future version will be more perfect, look forward to the mature version.

Reference Document: Http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/8D/0C/wKioL1iEUHHi3tJZAAAmOzv8Il8660.jpg "title=" 6.jpg " alt= "Wkiol1ieuhhi3tjzaaamozv8il8660.jpg"/>


This article is from "Wang Wei" blog, please be sure to keep this source http://wangwei007.blog.51cto.com/68019/1893703

Research and analysis of MySQL Group replication

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.