Transcript: Replication Technology in MySQL 8.0, mysql8.0

Source: Internet
Author: User

Transcript: Replication Technology in MySQL 8.0, mysql8.0

Song Libing, R & D Engineer of Oracle MySQL, gave a speech on "replication technology in MySQL-8.0" at the recent seventh data technology carnival, this paper introduces the development direction of asynchronous Replication and Group Replication in MySQL-8.0, as well as new technologies and new features. We will share it with you again, hoping to provide some guidance for you.


Introduction and framework of Replication


01

Definition



02

A simple framework of MySQL replication technology



First, in the replication environment, there are two servers that generate binary logs in the first server. Generally, this server becomes the master, and the other server copies the binary logs on the master, then, the application of logs generates the same database as the master database. This is the basic theory of replication. The basic process is as follows:



When the application executes SQL statements on the master database, these operations will be captured by the database and written to binary log in the form of event, and stored as files. Through the communication module, these events will be sent to the swill, and the receiving thread on the swill will receive these events and store them to the Redo log. Then, the read thread will read these events, execute in the replication database in parallel to generate the same database as the master database.


Binary log is the basis for MySQL replication. These MySQL logs are called logical logs, which record the SQL statement level and operate the row data in the table, it does not care about how data is stored in the engine, and what the storage format is.



Binary log has two modes

One is ROW format.

One is the statement format.


The statement format is well understood, that is, when an SQL statement is executed, the statement is stored in the event in the form of text, and then executed in slave as an SQL statement. The ROW format is a DML statement for the operation table. The ROW format does not record the original statement, but records the content of the involved rows into the event. Compared with the two, the ROW format is more secure, while statement sometimes depends on the environment. For example, a random number function is used, the execution results of these functions on the master and slave may be inconsistent. The ROW format represents the final result and the final data is directly copied. Therefore, this problem will not occur.


In additionThe ROW format does not require SQL parsing to be executed on the slave, and the performance will be better. In version 8.0, the default format is the ROW format..


Event is composed of SQL statements at the statement level. A transaction generally consists of multiple events. Generally, there is a CTID event at the beginning of the transaction, which is a globally unique idnumber. Next it is a begin event, followed by an event generated by a series of statements, finally, it is composed of a commit event.


For ddl statements, there is no begin or commit.


We can see that when storing binary logs, a transaction is the smallest storage unit, and all the statements in the transaction are sequential, which is continuously stored in the log, there will be no event interspersed with two transactions. In addition to the event generated by the transaction, some events are used for control.


03

Three replication Modes



MySQL introduced asynchronous replication a long time ago, starting with version 3.23.Asynchronous replication is used for MySQL and ensures the high availability of MySQL Data.. Till now, we have been using the asynchronous replication mode. With more and more users and more application scenarios, we have also put forward higher requirements for the replication technology. In the previous asynchronous replication mode, the transaction execution process is completely isolated from the statement transmission process. The master only performs its operations, instead of worrying about whether the data is transmitted to the slave. This mode may cause problems in some scenarios where high data reliability is required, and may occur. The user finds that all my data has been submitted, however, if the master node fails to switch to the slave node, the user data does not exist.


Therefore, in version 5.5, semi-synchronous replication is introduced. In the extreme of semi-synchronization, the transaction execution process is related to the event transmission process. On the master, the transaction will not be committed immediately after the bin log is written. The transaction will be committed only after the event generated by the transaction has been copied to another node and the other nodes have responded, in this way, the data submitted by all users can be viewed on other nodes, so that the reliability is higher.


In version 5.7.17, a new replication mode called group replication is introduced, which is also retained in version 8.0. Compared with semi-synchronous replication, GR and asynchronous replication are also introduced, there are three obvious advantages:

First, higher reliability, Mainly reflected in the GR does not show split-brain. Or, when split-brain occurs in the network, no inconsistent data is generated, ensuring data consistency;


Second, GR can be written moreSupports simultaneous Writing of multiple or all nodes. However, inconsistency does not occur.


Third, the concept of GR groupingTherefore, there are many groups of self-management functions. In the first two replication methods, a third-party tool is required to play a program to maintain operations between nodes, such as when performing a failover, in GR, all these management tasks can be completed by yourself. Therefore, it is particularly suitable for the current large-scale data use of MySQL clusters.


Typical use Cases


With the development of the Internet and the development of cloud computing, most of MySQL is now providing services through clusters. Therefore, clusters have many new requirements and features when using MySQL. I will introduce the use cases of MySQL clusters in several categories.


01

Replicate


This is the most basic architecture, that is, the high-availability architecture.



High Availability previously used asynchronous replication and semi-synchronous replication. With the release of 8.0, the GR solution became increasingly mature, we can expect more users to use group replication as the most basic high-availability unit. We also recommend using GR because of the three obvious advantages mentioned above.


02

Automatic



Second, when using clusters on a large scale, we also hope to reduce labor costs and improve efficiency through automation. In GR, it contains many automated functions. This is mainly because of the concept of group. In the entire cluster, each node knows that it is in a group and can also know the status information of all other nodes, because they communicate with each other. In this case,If the master node is down, other nodes will be aware of it as soon as possible, and the new node will be selected as the master node..


There are two multi-write modes for GR. The single-group mode is mainly used to replace the asynchronous mode.. In asynchronous replication, if the system goes down, you still need to use tools or programs to detect the system goes down, but GR can perceive it on its own. The two nodes are not online at the same time or the data is inconsistent.


03

Data integration-Integrate


A large amount of data is stored in the database, and more enterprises are doing data analysis. In MySQL GR, one solution is to export data from binary Log, import the data to other platforms for analysis. In this process, we add a lot of metadata information to help users better understand the meaning of the data. This facilitates data conversion.


04

Remote Disaster Recovery


If the physical distance span is large, it can be transmitted through asynchronous replication. For large volumes of data reading, the performance is also greatly improved.


GR is a basic high-availability architecture. On this basis, you can expand the performance of some read data. However, this is subject to certain restrictions.


Because of the management of groups, if you want to copy the data in the group or copy data from other nodes to the group, it is supported, therefore, it is convenient to combine asynchronous Replication and Group Replication.


Of course, MySQL has been working on a complete solution. Therefore, InnoDB Cluster has been launched. The components of InnoDB include shell, rooter, there are also highly available clusters with group replication as the core. Because GR knows the concept of a group, it integrates the concept of Cluster in shell, including Cluster management functions and commands, therefore, it is easy to deploy the InnoDB d cluster, and implement load balancing, read/write splitting, and other functions through the rooter. This constitutes a complete database cluster system without any third-party tools.



New features in 8.0


Next we will introduce the new features of replication technology in 8.0.


It mainly includes the following aspects:



01

Metadata in binary log



The role of metadata is to make it easier for users to extract data, and to replicate MySQL itself.For example, Character Set conversion can be performed to accurately determine whether the data types of master and slave are consistent. Of course, these functions are not fully implemented yet, currently, more metadata information is stored in binary logs.


These metadata are mainly divided into two types.

One is to add metadata in the GTID event.. The other is the length of the transaction, that is, the length of all events contained in the transaction. The timestamp in the commit event is also added. The time error here is millisecond-level, so that we can monitor the delay of data replication.

The second type is table map event.It mainly adds the specific type of each field, whether there is a symbol, the character set to which it belongs, and also includes the column name, primary key, enumerated and set string values. Will be recorded in the metadata information. In this way, it is very convenient to know the Data Type accurately when extracting data.


02

New features in operations


The most important thing here is multi-element replication,The use of multi-element replication is mainly used for data aggregation.There is a scenario where I only need a part of the data in a table or a part of the table. If I need to do aggregation, I need to filter the table first, but in 8.0, the filtering function has been greatly improved. You can set your own filter policies for each channel.


For example, in the figure, there are three tables on node A and other tables on Node B. The rule can be set separately as follows: Data replication from node A to Node B, and all three tables are copied, for the channel from B to C, the rule is specified as a table that does not copy the user.


WhileGroup Replication protects the data system and ensures that each node is not accidentally updated..

A process is required to add a node to the group before it takes effect. First, you must add nodes to the network so that nodes can perceive each other before joining the group. It is possible that the node has not been added to the group after the network is added. At this time, the application has discovered the node or sent the information directly to the new node, at this time, data inconsistency may occur.


Therefore, to ensure data consistency between nodes in the group, you must set the attribute to read-only before adding the node to the network. After adding the node to the network, it will be based on its own role, automatically switch the role, that is, you do not need to manually modify the read-only attribute. For example, if it is a server Load balancer role in the group, it will keep read-only. If it is selected as the master, it will switch to read-write.


The same principle is true when a node leaves the group. It automatically switches its attributes. That is to say, when it is no longer in the group, it does not matter if the administrator forgets to delete it, because its attribute will be automatically switched to read-only, even if the data is sent to it, it won't receive either.


There are many machines in the cluster. The configuration, geographical location, and other information of these machines may be different. At this time, users may have different data operation requirements, for example, you need a specific geographic location or a specific configuration. If you want to select these nodes with the same attributes as a Group, you can set the weight of each node by adding a feature in 8.0. this parameter is Group Replication Election Weights, you can configure an independent weight for each node. During the election, the node with the highest weight will be selected as the master node.



03

Throttling mechanism of sentence Group Management



In the cluster, we hope that the data between nodes will be synchronized without delay. However, data latency may be caused by some unexpected situations. Therefore, we add more parameters. You only need to adjust and configure the parameters. When the cluster finds that data is not synchronized, it automatically performs traffic control.


04

New monitoring features



We mentioned just now that the commit timestamp is added to the GTID event. This timestamp is millisecond-level, so we can monitor replication latency in milliseconds. There are two timestamps. The first one is the timestamp corresponding to the event that initially generated the transaction. When all nodes complete the transaction, the timestamp is retained when recording the binary log. In addition, a timestamp is recorded on the node. Therefore, we can know the latency between two nodes through comparison. This allows you to conveniently monitor the latency of data transmission between two nodes.



In terms of node data synchronization between groups, generally, some business logic is complex. For example, when a node is added to a group, data replication is performed on a new node, it is done through asynchronous replication. The monitoring of Group replication is also enhanced in 8.0. Some fields are added to the replication Group Member table to check the status information of all nodes on any node. It can better monitor the information of group members, including the master node, slave node, and version information of all nodes.


05

New performance features


One of the most important features in performance is the writeset-based concurrency policy.

Writeset is essentially based on the primary key, or a row-based concurrency policy.. If two transactions modify data of different rows, they can be executed concurrently due to different primary keys. At this time, execution will have an obvious feature. When there is only one thread on the master, the concurrent execution performance on the slave will be higher than that on the master, and there will be only a few threads on the master, when the concurrency is not large, the performance page on the slave remains good because the transaction on a thread does not affect each other as long as it is modified by different rows. Therefore, the concurrency mechanism based on writeset is executed in the maximum concurrency mode on the slave.


In some user scenarios, we may want to avoid concurrent execution in the same session. It may conflict with some business logic. Therefore, there is another mechanism based on writeset. Determines whether the transaction is a transaction of the same session. If it is for the same session, it will be executed in sequence.


The biggest benefit of writeset is:When slave needs to chase master data or create a new slave to synchronize the previous data, it can greatly improve the performance.


The following is a performance comparison test that does not use or use writeset.



In general, when the number of threads on the master node is not large, writeset will improve the performance.


06

A new feature supported by the JSON document


JSON is stored in MySQL in the form of block or text fields. It is usually large, but we often update only a small part of it during update. Previously, during the update process, the entire field is stored in binary log and then modified. In this way, the occupied space is relatively large. If the application fully uses JSON, it will cause a great waste of space. In 8.0, before adding the content to the binary log, the system will detect the number of updates to this field. In binary log, only a part is recorded.



It can be seen that the space savings are very obvious. The following is the performance test result of partial JSON: the performance varies according to the proportion of the JSON object modification.



07

Other new features


MySQL subsequent version plan


Final Goal



For more information, please follow the public account and reply to the keyword "DTC" to get the guest speech video.


With the development of technology, the value of data in enterprises has become increasingly prominent. The data technology carnival sponsored by ACoG and Yun and emo shares cutting-edge information and dry goods technologies around core technologies in the data and database fields, the road to enterprise transformation and strategic direction invite you to explore the value of data and create a better future together! The 8 th data technology carnival will kick off on March 13, November 16, 2018. It's a wonderful time to come!



Related reading:

From commercial to open-source: Best practices for migrating DB2 to MySQL

Analysis of Common Faults and structure optimization cases in traditional MySQL Replication

Copy From master to Group Replication

MySQL Group Replication Study Notes

In-depth analysis of Engine Features of the Group Replication Kernel

Download Resources

Public Account: Data and cloud (OraNews) reply keyword acquisition

'Dedicated DTC', Dig DTC Conference PPT

'Dbalife', "DBA's Day" Poster

'Dba04', DBA Manual 4, classic chapter ebook

'Internal', Oracle RAC PPT

'122arch', Oracle 12.2 architecture Diagram

'Drawing oow', Oracle OpenWorld documents

'Prection', Lecture hall Course Materials

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.