MySQL replication principle and actual operation process

Source: Internet
Author: User

The following articles mainly describe the MySQL replication principle. As well as the introduction of the actual operation process of MySQL replication, as well as the advantages and disadvantages of the MySQL replication implementation level. The following is a detailed description of the article, I hope you will gain some benefits.

1. Copy a process

Mysql replication) is an asynchronous replication, from a Mysql instace called the Master) to another Mysql instance called the Slave ). The entire replication operation is completed by three processes, two of which are in the SlaveSql process and IO process), and the other is in the MasterIO process.

To implement replication, you must first enable the binary logbin-log function on the Master side. Otherwise, it cannot be implemented. Because the whole MySQL replication process is actually the process where Slave obtains the log from the Master end and then executes the operations recorded in the log in full order on itself.

The basic process of replication is as follows:

1) log content after the IO process on the Slave connects to the Master and requests the log content from the specified location of the specified log file or from the first log;

2) After the Master receives a request from the Slave IO process, it reads the log information after the specified log Location Based on the Request Information by the IO process responsible for replication, the IO process returned to Slave. Besides the information contained in the log, the returned information also includes the name of the bin-log file on the Master end and the location of the bin-log;

3) after the Slave IO process receives the information, it will add the received log content to the end of the relay-log file on the Slave end in sequence, and record the file name and location of the bin-log on the Master end to the master-info file, so that the next read can clearly show the High-Speed Master "I need to send the log content from the location of a bin-log to me ";

4) after the Slave SQL process detects the newly added content in relay-log, it will immediately parse the relay-log content into the executable content during actual execution on the Master side, and execute it on your own.

In fact, the replication Implementation of Mysql in the old version is not completed by two processes on the Slave end, but by one process. However, it was found that there were major risks and performance problems, mainly as follows:

First, a process copies the bin-log, parses the log, and runs it as a serial process. The performance is limited, the latency of asynchronous replication is also relatively long.

In addition, after the Slave side obtains the bin-log from the Master side, it needs to parse the log Content and then execute it on its own. In this process, the Master may produce a large number of changes and claim a large number of logs. If an irreparable error occurs in the Master storage at this stage, all changes made at this stage will never be recovered. If the load on the Slave side is relatively high, the process may take a long time.

To solve this risk and improve the replication performance, Mysql later versions will replace the replication on the Slave end with two processes. Yahoo! Jeremy Zawodny, an engineering engineer ". This not only solves the performance problem, but also shortens the asynchronous delay time and reduces the possible data loss.

Of course, even after these two threads are processed, there is still the possibility of slave Data latency and data loss. After all, this replication is asynchronous. As long as data changes are not in one thing, these problems will exist. To completely avoid these problems, you can only use the mysql cluster to solve them.

However, the mysql cluster is a memory database solution. You need to load all the data to the memory, which requires a lot of memory, it is not very practical for general applications.

2. Replication implementation level

Mysql replication can be based on a Statement level) or a Row level record. You can set this Mysql replication level in MySQL configuration parameters, the setting of different replication levels will affect the bin-log records on the Master end to different forms.

Row Level: the log records the modified form of each Row of data, and then modifies the same data on the slave end.

Advantage: in row level mode, bin-log does not record the context-related information of the executed SQL statement, but only needs to record the record that has been modified, to what it is. Therefore, the row-level log records the details of each row of data modification, which is easy to understand. In addition, stored procedures, functions, and trigger calls and triggers cannot be correctly copied in certain situations.

Disadvantage: When all statements executed at row level are recorded in logs, they are all recorded with modifications recorded in each line. This may produce a large amount of log Content, for example, there is an update sentence:

Update product set owner_member_id = 'B' where owner_member_id = 'A'. After execution, what is recorded in the log is not the bin-log recorded by mysql in the form of an event corresponding to this update statement), but the changes of each record updated by this statement, in this way, many events are recorded and updated.

Naturally, the amount of bin-log logs will be large. Especially when statements such as alter table are executed, the log volume generated is astonishing. Because Mysql treats alter table and other table structure change statements in a way that each record of the entire table needs to be changed. In fact, the whole table is rebuilt. Each record of the table is recorded in the log.

Statement Level: Each SQL Statement that modifies data is recorded in the bin-log of the master. When the Server Load balancer replicates, the SQL process parses the same SQL statement that was previously executed on the master to execute it again.

Advantages: statement level first solves the disadvantages of row level. It does not need to record the changes of each row of data, reduces the bin-log volume, saves IO, and improves performance. Because it only needs to record the details of the statements executed on the Master and the context information during statement execution.

Disadvantage: because it is a recorded execution statement, in order to make these statements run correctly on the slave end, it must also record some information about each statement during execution, that is, the context message to ensure that all statements can get the same results as those executed on the master node when the slave terminal cup is executed.

In addition, because Mysql is developing fast now and many new features are continuously added, mysql replication has encountered great challenges. The more complex the content involved during natural replication, the more common the bug is. In statement level, many cases have been found to cause mysql replication problems, mainly when some specific functions or functions are used for data modification,

For example, the sleep () function cannot be actually copied in some versions, and the last_insert_id () function is used in the stored procedure, which may lead to an inconsistent id between the slave and the master. Since the row level records changes based on each row, no similar problems will occur.

From the official documents, we can see that Mysql has only been in statement-based replication mode until Mysql 5.1.5 starts to support row-level replication. Since Mysql 5.0, Mysql replication has solved the problem that many old versions cannot be copied correctly. However, the emergence of stored procedures poses a new challenge to Mysql replication.

In addition, according to the official documentation, Mysql has provided the third replication mode except Statement Level and Row Level since version 5.1.8: Mixed, which is actually the combination of the first two modes. In Mixed mode, Mysql will differentiate the log format to be logged based on each specific SQL Statement executed, that is, select one between Statement and Row. The Statment level in the new version is the same as before, and only the statements executed are recorded.

The row level mode of the new Mysql squadron has also been optimized. Not all modifications are recorded by row level. For example, when the table structure is changed, it will be recorded in statement mode, if the SQL statement is indeed a statement that modifies data such as update or delete, changes to all rows will be recorded.

3. Copy common architecture

More than 90% of the Mysql replication environment is an architecture model with one or more Server Load balancer instances on the Master node. It is mainly used for database Expansion Solutions for applications with high read pressure. Because only the pressure on the master and slave is not too high, especially the load on the slave end), the latency of asynchronous replication is usually very small. Especially after the replication mode on the slave side is changed to two processes for processing, the latency on the slave side is reduced.

The benefit is that for applications that do not require special sensitivity to real-time data, you only need to use cheap pc servers to expand the number of slave instances, distribute the read pressure to multiple slave machines to solve the read Pressure bottleneck on the database. This solves the database pressure Bottlenecks of many small and medium-sized websites to a large extent, and even some large websites are using similar solutions to solve database bottlenecks.

The architecture of one Master node with multiple slave instances is very simple. The implementation of multiple slave instances is not much different from that of a single slave. On the Master side, the Server Load balancer does not care how many Server Load balancer instances are connected to the master. If a Server Load balancer process passes connection authentication, it requests binlog information from the Master, it will read its binlog information according to the requirements of the connected io process and return it to the slave IO process. For slave configuration details, I have already made it clear in the Mysql official documentation, and even introduced a variety of slave configuration methods.

Mysql does not support the architecture of one Slave instance that belongs to multiple masters. That is to say, a server Load balancer instance can only accept the synchronization source of one master. I have heard that patch can improve this function, but it has never been practiced. The reason why Mysql AB does not implement such a function is mainly to solve the conflict.

Mysql can also be built into dual master mode, that is to say, the two Mysql instances are the Master of each other and the Slave of the other. However, this architecture usually only provides services at one end to avoid conflicts. Even if the modifications executed on both sides are sequential, the asynchronous implementation mechanism of MySQL replication may also overwrite the modifications made later, it is like the following situation:

Time Point Mysql A Mysql B

1. Update the y record of Table x to 10.

2. Update the y record of Table x to 20.

3. obtain and apply the log. Updating the y record of Table x to 10 does not meet the expectation)

4. Obtain B logs, update table y, and set the record to 20 to meet the expectation)

In this way, the data in database B is not the expected result, and the data on both sides of database A and database B is inconsistent. The above problem can be avoided unless the write operation is fixed at both ends of A and B according to certain conditions, so as to avoid cross-writing.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.