Mysql replication, mysql master-slave Replication

Source: Internet
Author: User

Mysql replication, mysql master-slave Replication

I. Significance of Replication

Mysql replication is the foundation for building large-scale and high-performance MySql-based applications. We can configure one or more slave databases for the server for data synchronization; the replication function is not only conducive to the construction of high-performance applications, but also the basis for high availability, scalability, disaster recovery, backup, and data warehousing.

 

Ii. replication method

Mysql supports statement-based replication, row-based replication, and hybrid replication. There are also three binlog formats: STATEMENT, ROW, and MIXED.

(1) Statement-based replication (SBR)

Each SQL statement that modifies data is recorded in the binlog. The advantage is that you do not need to record the changes of each SQL statement and data row, which reduces the binlog volume, saves IO, and improves performance. The disadvantage is that data inconsistency (such as RAND (), UUID (), stored procedure, trigger, etc.) in the master-slave may occur in some cases)

(2) Row-based replication (RBR)

Instead of recording the context information of each SQL statement, you need to record which data has been modified and changed to what type. In addition, in some specific situations, stored procedures and triggers cannot be correctly copied in statement-based replication mode. The disadvantage is that a large number of logs will be generated, especially when alter table is used, which will cause the log to soar and cannot accurately judge the SQL statements executed, in addition, the replication fails when the schema of the table is changed in the slave database, for example, the last column is not added or the column is deleted.

(3) hybrid replication (MRB)

For mixed use of the preceding two modes, MySQL selects the log storage method based on the executed SQL STATEMENT. For general replication, the STATEMENT mode is used to save the binlog, for operations that cannot be copied in STATEMENT mode, use ROW mode to save binlog.

 

Iii. Copy-Related Files

Mysql-bin.index: When binary logs are enabled on the server, a binary log with the same name but. index is a suffix file used to record binary log files on the disk. The "index" here does not refer to the index of the table, instead, each row of the file contains a binary file name. Mysql depends on this file. mysql cannot identify the binary file unless it is recorded in this file.

Mysql-relay-bin-index: The index file for relay logs works similarly to the mysql-bin.index

Master.info: this file is used to save the information required by the slave database to the master database. The format is saved text. The recorded information may vary with mysql versions. This file cannot be deleted, otherwise, the slave database cannot connect to the master database after being restarted. In addition, this file records the password of the copied user in the form of text, so pay attention to the permission control of this file.

Relay-log.info: This file contains the binary log and relay log coordinates of the current slave database (for example, the location where the slave database is copied in the master database), and does not need to delete this file, otherwise, after the slave database is restarted, it will not be able to know where to start the replication, which may cause replay of executed statements.

 

Iv. Principles of Replication

1. The master database records data changes in Binary logs (these records are called Binary Log events)

2. The slave database starts a working process called an I/O thread. A common client connection is established to the master database through the I/O thread. The slave database also starts an SQL thread.

3. Start a special binlog dump thread on the master database (this thread does not have the corresponding SQL command)

4. the binary dump thread on the master database reads the events in the binary log on the master database and sends them to the slave database through a socket connection, the I/O thread on the slave database records the received events to the relay log. The binary dump thread on the master database does not poll the events. If the process catches up with the master database, it enters the sleep state until the main database sends a semaphore to notify it of the occurrence of a new event.

5. the SQL thread of the slave database executes the last step. The thread reads the event from the relay log and runs it in the slave database to update the data of the slave database. When the SQL thread catches up with the I/O thread, the relay log is usually already in the system cache, so the overhead of the relay log is very low. The SQL thread execution event can also be configured to determine whether to write the event to the binary log of the slave database.

 

V. Replication scenarios

1. synchronous Replication

MySQL Cluster (NDB) adopts synchronous replication to ensure strong data consistency in the Cluster.
Its memory storage engine is based on the shared-nothing architecture. It has limited application scenarios and is rarely used by businesses.

2. asynchronous replication

Data is returned when it is written to the master database. logs are pulled from the database through the IO thread, and then asynchronously played back through the SQL thread.
Advantage: You can write data to the master database without the cost of data replication.
Disadvantage: Service Data Reading is inconsistent. When the master database crashes, the slave database data is inconsistent with the master database.
Application Scenario: businesses with low requirements for data read consistency

3. semi-synchronous Replication

1) Semi-sync

MySQL 5.5 introduces semi-synchronous replication (semisync) to ensure that at least one slave is consistent with that of the master.
Http://dev.mysql.com/doc/refman/5.6/en/replication-semisync.html
After the master writes the data, it sends the binlog to the slave. semi-synchronous replication does not require the slave to be executed. After the slave receives the log, it sends the ack. After the master receives the first ack, the transaction ends.

However, this method may cause master-slave data inconsistency: when the master innodb commit is successfully executed and the binlog is synchronized to the crash before the slave, data inconsistency will occur.

 

2) group semi-sync

Semi-synchronous replication is adopted, and the same data center first returns ack due to the high probability of a slave database. In this way, cross-Data Center disaster tolerance becomes an empty talk. As a result, the development of a group semi-sync (semisync + group slave), divided different data centers into different groups, each data center (group) has at least one slave database to return ack, the transaction is complete.

 

Advantage: semi-Sync and group semi-sync maximize Data Consistency

Disadvantages:
Introduce performance problems caused by synchronization (waiting for ack)
If a single data center fails, the transaction will be hang and need to be degraded to asynchronous
Data inconsistency still exists
The master database crashes between commit and binlog synchronization.
Semi-sync degrades to asynchronous after timeout (10000 ms by default)

3) lossless semi-sync

MySQL 5.5 and 5.6 semi-sync causes data inconsistency (commit first and then sync). MySQL 5.7.2 introduces lossless (loessness) Semi-sync, that is, the data is written to the slave relay log and then commit.
Https://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html
Http://my-replication-life.blogspot.com/2013/09/loss-less-semi-synchronous-replication.html

 

 

6. parallel replication

1. Database-level parallel replication

Mysql5.6

Io_thread: pulls the binlog from the master database according to the binlog dump protocol, and transfers the binlog to the local relaylog;

Coordinator_thread: reads the relay log, distributes the read binlog events to each worker thread in transaction units for execution, and executes the binlog event as needed.

Worker_thread (SQL _thread): executes the binlog event allocated to each thread, which does not affect each other.

Multithreading principle:

SQL _thread distribution is based on the database name of the current transaction Lock operation. If the transaction is cross-sharded, the distribution will continue until all the allocated Database Change transactions are executed.

 

2. parallel replication based on GroupCommit

In MySQL, another parallel method is added to the implementation of parallel replication. That is, when the master database is in the second stage of orderd_commit, The binlogs of the same batch of commit are tagged with the same seqno, transactions with the same timestamp can be executed in the standby database at the same time. This simplifies the logic of parallel replication and breaks the limit that the same database of MySQL cannot be copied in parallel. During the execution of the slave database, transactions with the same seqno can be concurrently executed in the slave database, with mutual interference and no need to bind information, the last batch of seqno transactions can be executed only after the previous batch of seqno transactions have been executed.

Advantage: support for both the sulfate and RBR

Disadvantage: the larger the transaction, the more DML operations, the smaller the chance that the master database can submit at the same time, and the smaller the degree of parallelism that the slave database replies

In normal business pressure mode, there are not many transactions committed by the master database at the same time, and the time required for historical backup or data chasing after the schema change in the slave database cannot be significantly shortened.

 

3. Table-level parallel replication

Change the basic Db-based distribution of the original mysql5.6 to db_name + table_name. Different db_name + table_name can be distributed to different worker_thread for execution.

 

4. Row-level parallel replication

The binlog in Mysql Row format records all the field information of each Row. Therefore, you can retrieve the primary key or unique key of each Row and check the conflict through db_name + table_name + primary_key, non-conflicting transactions can be executed in parallel to achieve row-level parallel replication. Disadvantages: Only RBR is supported.

 

7. Other knowledge points

1. server id

Server_id is defined in my. in cnf, server_id = xxx. A unique server ID must be specified. The default server ID is usually 1 (this is related to the version, which is not allowed in some mysql versions ). Using the default value may cause a conflict with the ID of another server, so make sure it is unique and unchanged.

To prevent infinite loop replication during Mysql replication, when the SQL thread reads the relay log, it will lose the event with the same server ID as the server ID recorded in the event, this breaks the infinite loop in the replication process. It is important to break the infinite loop in some replication topology structures, such as the master-master replication structure.

2. redo log and bin log

Unlike oracle, the synchronization between the mysql master database and the standby database is implemented through binlog, while the redo log is only used as the crash recovery of the mysql instance.

Mysql gave up the redo synchronization policy at 4.x and introduced binlog synchronization. One important reason is to be compatible with other non-transaction storage engines. Otherwise, there is no way to perform master-slave synchronization.

Redo log synchronization is a physical synchronization method. It is simple and direct. It transfers the modified physical part to the slave database for execution. The master and slave databases share the same LSN, as long as the LSN is the same, at the same time, only the master or slave database can accept write requests. The binlog synchronization method is logical replication and can be divided into statement or row modes, where statement records SQL statements, the Row mode records the modified and modified records, that is, the pre-image and post-image. The slave database pulls the binlog through the binlog dump protocol and then runs it in the slave database. If the pulled binlog is an SQL statement, the Standby database follows the same logic as the primary database. If it is in row format, the storage engine is called to perform corresponding modifications.

 

 

PS: We recommend a good friend's public account, a girl who is thinking about or thinking about how to operate a public account every day ~

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.