MySQL master-slave replication principle and architecture, mysql master-slave architecture
1 copy Overview
The built-in replication function of Mysql is the basis for building large and high-performance applications. This distribution mechanism replicates data from a Mysql host to another Server Load balancer instance, and execute it again. During the replication process, one server acts as the master server, and one or more other servers act as slave servers. The master server writes a new binary log file and maintains an index of the file to track log loops. These logs can record updates sent to the slave server. When an slave server connects to the master server, it notifies the master server of the last successful update location read from the server in the log. The slave server receives any updates from that time, blocks them, and waits for the master server to notify you of new updates.
Note that when you perform a copy, all updates to the tables in the copy must be performed on the master server. Otherwise, you must be careful to avoid conflicts between updates to tables on the master server and those on the slave server.
1.1 mysql supports the following replication types:
(1) Statement-based replication: SQL statements executed on the master server and the same statements executed on the slave server. MySQL uses statement-based replication by default, which is more efficient. The row-based replication option is automatically selected once the exact replication fails.
(2): Row-based replication: Copies the changed content instead of executing the command on the slave server. It is supported from mysql5.0.
(3) hybrid replication: Statement-based replication is used by default. If statement-based replication fails, row-based replication is used.
1.2. Problems solved by Replication
MySQL replication technology has the following features:
(1) Data distribution)
(2) load balancing)
(3) backup (Backups)
(4) High availability and fault tolerance High availability and failover
1.3 How replication works
In general, there are three steps for replication:
(1) The master will record changes to binary logs (these records are called binary log events and binary log events );
(2) slave copies the binary log events of the master to its relay log );
(3) slave redo the events in the relay log, which will change and reflect its own data.
Describes the replication process:
The first part of this process is that the master records binary logs. Before each Transaction Completes data update, the master will record these changes in the second log. MySQL writes the transaction into the binary log serially, even if the statements in the transaction are cross-executed. After the event is written to the binary log, the master notifies the storage engine to submit the transaction.
The next step is to copy the binary log of the master to its own relay log. First, slave starts a working thread-I/O thread. The I/O thread opens a normal connection on the master and starts the binlog dump process. The Binlog dump process reads the event from the binary log of the master. If it has already followed the master, it will sleep and wait for the master to generate a new event. The I/O thread writes these events to the log.
The last step of SQL slave thread processing this process. The SQL thread reads the event from the relay log, replays the event, and updates the slave data so that it is consistent with the data in the master. As long as the thread is consistent with the I/O thread, the relay log is usually located in the OS cache, so the overhead of the relay log is very small.
In addition, there is also a working thread in the master: Like other MySQL connections, opening a connection in the master will also enable the master to start a thread. There is an important limitation in the replication process-replication is serialized on the slave, that is, the parallel update operations on the master cannot be performed in parallel on the slave.
2. Copy configurations
There are twoMySQL databaseServer Master, Server Load balancer, and Master are the Master servers. Server Load balancer is the slave server. In the initial state, the data information in the Master and Server Load balancer is the same. When the data in the Master node changes, slave also changes accordingly, so that the master and slave data information is synchronized to achieve the purpose of backup.
Key points:
The media responsible for transmitting various modification actions on the master and slave servers is the binary change log of the master server, which records the various modification actions to be transmitted to the slave server. Therefore, the master server must activate the binary log function. The slave server must have the permission to connect to the master server and request the master server to transmit the binary change log to it.
Environment:
The MySQL database versions of Master and slave are 5.0.18.
OS: unbuntu 11.10
IP Address: 10.100.0.100
2.1 create a replication account
1. Create a backup account in the Master database: each slave uses the standard MySQL user name and password to connect to the master. The replication slave permission is granted to the user who performs the REPLICATION operation. The username and password are stored in the master.info text file.
The command is as follows:
Mysql> grant replication slave, RELOAD, super on *.*
TO backup @ '10. 100.0.200'
Identified by '20140901 ';
Create an account backup and allow login only from 10.100.0.200. The password is 1234.
(If the new and old password algorithms of mysql are different, you can set: set password for 'backup '@ '10. 100.0.200' = old_password ('200 '))
2.2 copy data
(If you have completely installed the mysql Master/Slave server, this step is not required. Because the newly installed master and slave have the same data)
Shut down the Master server and copy the data in the Master to server B to synchronize the data in the Master and Server Load balancer. Make sure that before all the settings are completed,Do not perform write operations on the Master and slave servers, so that the data in the two databases must be the same!
2.3 configure master
Next, configure the master, including opening the binary log and specifying a unique servr ID. For example, add the following values to the configuration file:
Server-id = 1
Log-bin = mysql-bin
Server-id: ID value of master server
Log-bin: Binary Change Daily Value
Restart the master and run show master status. The output is as follows:
2.4. The configuration of slaveSlave is similar to that of the master. You also need to restart the MySQL of slave. As follows:
Log_bin = mysql-bin
Server_id = 2
Relay_log = mysql-relay-bin
Log_slave_updates = 1
Read_only = 1
Server_id is required and unique. Slave does not need to enable binary logs, but in some cases, it must be set. For example, if slave is another slave master, bin_log must be set. Here, we enable binary logs and display the name (the default name is hostname, but if the hostname is changed, the problem will occur ).
Relay_log configures the relay log. log_slave_updates indicates that slave writes the replication event into its own binary log (which will be useful later ).
Some people have enabled the slave binary log, but have not set log_slave_updates, and then check whether the slave Data has changed. This is an incorrect configuration. Therefore, use read_only whenever possible, which prevents data changes (except for special threads ). However, read_only is very useful, especially for applications that need to create tables on slave. 2.5 When slave is enabled, the slave is connected to the master and the events in the binary log of the master are redone. Instead of using the configuration file for this operation, you should use the change master to statement. This statement can completely replace the modification TO the configuration file, and it can specify different masters for slave, you do not need to stop the server. As follows:
Mysql> change master to MASTER_HOST = 'server1 ',
-> MASTER_USER = 'repl ',
-> MASTER_PASSWORD = 'p4ssword ',
-> MASTER_LOG_FILE = 'mysql-bin.000001 ',
-> MASTER_LOG_POS = 0;
The value of MASTER_LOG_POS is 0 because it is the start position of the log.
You can use the show slave status statement to check whether the slave settings are correct:
Mysql> show slave status \ G
* *************************** 1. row ***************************
Slave_IO_State:
Master_Host: server1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 4
Relay_Log_File: mysql-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: No
Slave_ SQL _Running: No
... Omitted...
Seconds_Behind_Master: NULL
Slave_IO_State, Slave_IO_Running, and Slave_ SQL _Running are No
It indicates that the replication process has not started. The log location is 4 rather than 0, because 0 is the start position of the log file, not the log location. In fact, MySQL knows that the first event is located at 4.
To start replication, you can run:
Mysql> start slave;
Run show slave status to view the output result:
Mysql> show slave status \ G
* *************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: server1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 164
Relay_Log_File: mysql-relay-bin.000001
Relay_Log_Pos: 164
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_ SQL _Running: Yes
... Omitted...
Seconds_Behind_Master: 0
Here we mainly look:
Slave_IO_Running = Yes
Slave_ SQL _Running = Yes
Slave's I/O and SQL threads have started to run, and Seconds_Behind_Master is no longer NULL. The log location is increased, meaning that some events are obtained and executed. If you modify the log file on the master, you can view the changes in the location of various log files on the slave, and you can also see the changes in the database data.
You can view the status of the master and slave threads. On the master, you can see the connection created by the slave I/O thread:
Enter show processlist \ G on the master;
Mysql> show processlist \ G * *************************** 1. row *************************** Id: 1 User: root Host: localhost: 2096 Db: test Command: Query Time: 0 State: NULL Info: show processlist * *************************** 2. row *************************** Id: 2 User: repl Host: localhost: 2144 Db: NULL Command: Binlog Dump Time: 1838 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL 2 rows in set (0.00 sec) |
Row 2 is the connection to the slave I/O thread.
Run this statement on the slave server:
Mysql> show processlist \ G * *************************** 1. row *************************** Id: 1 User: system user Host: Db: NULL Command: Connect Time: 2291 State: Waiting for master to send event Info: NULL * *************************** 2. row *************************** Id: 2 User: system user Host: Db: NULL Command: Connect Time: 1852 State: Has read all relay log; waiting for the slave I/O thread to update it Info: NULL * *************************** 3. row *************************** Id: 5 User: root Host: localhost: 2152 Db: test Command: Query Time: 0 State: NULL Info: show processlist 3 rows in set (0.00 sec) |
Row 1 is the I/O thread state, and row 2 is the SQL thread state.
2.5 Add a new slave server
If the master has been running for a long time, you want to synchronize the data of the newly installed slave, even if it does not have the data of the master.
There are several ways to start slave from another service, such as copying data from the master, cloning from the other slave, and starting from the latest backup. Three things are required for synchronization between Slave and master:
(1) Data snapshots at a certain time point of the master;
(2) The current log file of the master and the byte offset when the snapshot is generated. These two values can be called log file coordinates (log file coordinate), because they determine the location of a binary log, you can use the show master status Command to find the coordinates of the log file;
(3) binary log file of the master node.
You can use the following methods to create an slave:
(1) cold copy (cold copy)
Stop the master, copy the master files to slave, and restart the master. The disadvantage is obvious.
(2) hot copy (warm copy)
If you only use the MyISAM table, you can use mysqlhotcopy to copy the table, even if the server is running.
(3) Use mysqldump
Using mysqldump to get a data snapshot can be divided into the following steps:
<1> lock table: if you have not locked the table, you should lock the table to prevent other connections from modifying the database. Otherwise, the data you get may be inconsistent. As follows:
Mysql> flush tables with read lock;
<2> Use mysqldump to create a dump of the database you want to copy on another connection:
Shell> mysqldump -- all-databases -- lock-all-tables> dbdump. db
<3> release the lock on the table.
Mysql> unlock tables;
3. For more information about replication, we have discussed some basic information about replication. Next we will discuss about replication in depth. 3.1 Statement-Based Replication MySQL 5.0 and earlier versions only support Statement-Based replication (also called logical Replication and logical replication), which is not common in databases. The master records the query of changed data. Then, slave reads the event from the relay log and runs it. These SQL statements are the same as the statements executed by the master.
This method is easy to implement. In addition, statement-based replication of binary logs can be well compressed, and the log data volume is small, consuming less bandwidth-for example, A query that updates GB Data requires only dozens of bytes of binary logs. Mysqlbinlog is very convenient for statement-based log processing.
However, statement-based replication is not as simple as it looks, because some query statements depend on specific conditions of the master. For example, the master and slave may have different times. The format of MySQL binary logs is not only a query statement, but also metadata information, such as the current timestamp. Even so, some statements, such as the current user function, cannot be copied correctly. In addition, stored procedures and triggers are also a problem.
Another problem is that statement-based replication must be serialized. This requires a lot of special code, such as the next-key lock of InnoDB. Not all storage engines Support statement-based replication. 3.2. Record-Based Replication (Row-Based Replication) MySQL adds record-Based Replication to record actual data changes in binary logs, which is similar to other DBMS implementations. This method has advantages and disadvantages. The advantage is that it can work correctly on any statement, and some statements are more efficient. The main drawback is that binary logs may be large and not intuitive. Therefore, you cannot use mysqlbinlog to view binary logs.
For some statements, record-based replication can work more effectively, such:
Mysql> insert into summary_table (col1, col2, sum_col3)
-> SELECT col1, col2, sum (col3)
-> FROM enormous_table
-> Group by col1, col2;
Assume that there are only three unique combinations of col1 and col2. However, this query scans many rows in the original table, but returns only three records. In this case, record-based replication is more efficient.
On the other hand, the following statements make statement-based replication more effective:
Mysql> UPDATE enormous_table SET col1 = 0;
In this case, the cost of using record-based replication is very high. Because the two methods cannot be well handled in all cases, MySQL 5.1 supports dynamic exchange before statement-based replication and record-based replication. You can control it by setting the binlog_format session variable. 3.3 Apart from binary logs and relay log files, copy-related files also have other copy-related files. (1) Once binary logs are enabled on the mysql-bin.index server, a file with the same name as the binary log file but ended with. index is generated. It is used to track which binary log files exist on the disk. MySQL uses it to locate binary log files. Its content is as follows (on my machine ):
(2) mysql-relay-bin.index the functionality of this file is similar to that of the mysql-bin.index, but it is for relay logs, not binary logs. The content is as follows:
. \ Mysql-02-relay-bin.000017
. \ Mysql-02-relay-bin.000018 (3) master.info save information about the master. Do not delete it. Otherwise, you cannot connect to the master after the slave is restarted. The content is as follows (on my machine ):
The I/O thread updates the master.info file. The content is as follows (on my machine ):
. \ Mysql-02-relay-bin.000019 254 Mysql-01-bin.000010 286 0 52813 |
(4) The relay-log.info contains information about the current binary log and relay log in the slave. 3.4 send replication events to other slave when log_slave_updates is set, you can let slave Act as the master of other slave. In this case, slave writes the SQL thread-executed events to its own binary log. Then, its slave can get these events and execute them. As follows:
3.5 Replication filtering allows you to copy only a portion of data on the server. There are two Replication filtering methods: filter events in binary logs on the master; filter events in relay logs on slave. As follows:
4. The architecture of replication of common topology replication has the following basic principles:
(1) Each slave can have only one master;
(2) Each slave can have only one unique server ID;
(3) Each master can have many slave instances;
(4) If you set log_slave_updates, slave can be the master of other slave, thus spreading master updates.
MySQL does not support multi-master Replication-that is, one slave can have multiple masters. However, through some simple combinations, we can build a flexible and powerful replication architecture.
4.1 single master and multi-slave are the simplest scenarios where a replication system is composed of one master and one slave. Slave does not communicate with each other and can only communicate with the master.
In practical application scenarios, MySQL replication over 90% is an architecture mode where one Master node is replicated to one or more Slave instances. It is mainly used for database Expansion Solutions for applications with high read pressure. As long as the pressure on the Master and Slave is not too high (especially on the Slave end), the latency of asynchronous replication is usually very small. Especially since the replication method on the Slave end is changed to two threads for processing, the delay Problem on the Slave end is reduced. The benefit is that the real-time data requirements are not especially Critical. You only need to use a cheap pcserver to expand the number of Slave and distribute the read pressure to multiple Slave machines, the read performance bottleneck of the database can be solved by dispersing the read pressure of a single database server. After all, the read Pressure in most database application systems is much higher than the write pressure. This solves the database pressure Bottlenecks of many small and medium-sized websites to a large extent, and even some large websites are using similar solutions to solve database bottlenecks.
As follows:
This structure can be used if the number of write operations is small and the number of read operations is very high. You can distribute read operations to other slave instances to reduce the pressure on the master node. However, when slave increases to a certain number, slave's load on the master and network bandwidth will become a serious problem.
This structure is simple, but flexible enough to meet the needs of most applications. Some suggestions:
(1) Different slave plays different roles (for example, using different indexes or different storage engines );
(2) Use a Server Load balancer instance as the slave master and only copy the slave database;
(3) Use a remote slave for disaster recovery;
It should be clear that multiple Slave nodes can be copied from a Master node. Some may wonder if the Slave node can be copied from multiple Master nodes? At least for the time being, MySQL cannot be achieved. It is unclear whether MySQL will support MySQL in the future.
MySQL does not support the replication architecture of a server Load balancer node from multiple Master nodes, mainly to avoid conflicts and prevent data conflicts between multiple data sources, the inconsistency of the final data. However, I have heard that some related patches have been developed to allow MySQL to support the replication of a Slave node from multiple Master nodes as data sources. This is also the benefit of the open source nature of MySQL.
4.2 Active Mode Master-Master (Master-Master in Active-Active Mode) Master-Master copies two servers, both master and slave of another server. In this way, any change made by either party will be copied and applied to the database of the other party.
Some readers may have a worry. After the replication environment is set up, won't circular replication between two MySQL instances? In fact, MySQL has long thought of this, so the current MySQL server-id is recorded in MySQL BinaryLog, and this parameter must be explicitly specified when we set up MySQLReplication, in addition, the server-id parameter values of Master and Slave must be different in order to successfully build MySQL replication. Once the value of server-id is available, MySQL can easily determine which MySQLServer was originally generated for a change, so it is easy to avoid loop replication. Moreover, if we do not enable the BinaryLog option (-- log-Slave-update) for recording slave, MySQL will not record the changes in the replication process to BinaryLog, you don't have to worry about the possibility of loop replication.
Active Master-Master replication has some special functions. For example, both geographically distributed parts require their own writable data copies. The biggest problem with this structure is the update conflict. Assume that a table has only one row (one column) and its value is 1. If the two servers execute the following statement simultaneously:
Run the following command on the first Server:
Mysql> UPDATE tbl SET col = col + 1;
Run the following command on the second server:
Mysql> UPDATE tbl SET col = col * 2;
So what is the result? One server is 4 and the other server is 3, but this does not produce errors.
In fact, MySQL does not support multi-master Replication supported by other DBMS ), this is a huge limitation of the MySQL replication function (the difficulty of Multi-master servers lies in resolving update conflicts). However, if you have such requirements, you can use MySQL Cluster, by combining Cluster and Replication, you can build a powerful and high-performance database platform. However, you can simulate the replication of multiple master servers in other ways.
4.3 Active-Passive Mode Master-Master (Master-Master in Active-Passive Mode) This is the master-master structure changes, it avoids the disadvantages of the M-M, in fact, this is a fault tolerance and high availability system. The difference is that one service can only perform read-only operations.
4.4 cascade replication architecture Master-Slaves
In some application scenarios, the read/write pressure may be significantly different, And the read pressure may be extremely high. A Master may need 10 or more server load balancers to support the read injection pressure. At this time, the Master will be more difficult, because there are more SlaveIO threads connected only, so when the write pressure is a little higher, the Master node consumes a lot of resources because of replication, which can easily lead to replication latency.
How can this problem be solved? At this time, we can use MySQL to record the changed BinaryLog information generated by replication on the Slave end, that is, enable the-log-slave-update option. Second-level (or more) replication reduces the pressure on the Master node due to replication. That is to say, we first use a few MySQL instances to replicate from the Master. We call these machines the first-level Slave cluster, then, other Slave instances are copied from the first-level Slave cluster. The server Load balancer cluster that is replicated from the first-level Server Load balancer. If necessary, we can continue to add more levels of replication. In this way, we can easily control the number of Slave attached to each MySQL. This architecture is called the Master-Slaves architecture.
This multi-layer cascade replication architecture easily solves the risk that the Master node becomes a bottleneck because there are too many Slave instances. Demonstrate the Replication architecture of multi-layer cascade Replication.
Of course, if conditions permit, I would suggest you split them into multiple Replication clusters.
The above bottlenecks. After all, Slave does not reduce the write volume, and all Slave actually applies all data change operations without any write IO reduction. On the contrary, the more Slave, the more I/O writes to the entire cluster. We do not have a very obvious feeling. It is just because it is scattered across multiple machines, so it is not very easy to express.
In addition, the cascade level of replication is increased, and the same change is required to pass to the bottom-layer Slave, which may lead to a longer delay.
If we solve this problem by splitting the cluster, it may be much better. Of course, splitting the Cluster also requires more complex technologies and more complex application system architecture.
4.5 The Master-Master with Slaves structure with slave servers has the advantage of providing redundancy. Geographically distributed replication structure, which does not have a single node failure problem, and can also put read-intensive requests on slave.
To a certain extent, cascade replication solves the bottleneck caused by the excessive number of Slave instances attached to the Master, however, it does not solve the problem that manual maintenance and exceptions may exist after switchover. This naturally extends the Replication architecture that combines DualMaster with cascade Replication. I call it the Master-Slaves architecture.
Compared with the Master-Slaves architecture, the difference is only that the first-level Slave cluster is replaced with a separate Master, as a backup Master, then copy the Slave to a Slave cluster.
This architecture combines DualMaster with cascade Replication,The biggest benefit is that the write operation of the Master cannot be affected by the replication of the Slave cluster, at the same time, when the Master needs to be switched, Replication will basically not be redeployed. However, this architecture also has a drawback, that is, the Standby Master may become a bottleneck, because if the subsequent Slave cluster is large, the standby Master may become a bottleneck due to too many SlaveIO thread requests.Of course, when the standby Master does not provide any read service, the bottleneck may not be very high. If a bottleneck occurs, you can perform cascade replication again after the standby Master, build a multi-layer Slave cluster. Of course, the more levels of cascade replication, the more obvious the data delay that may occur in the Slave cluster. Therefore, before using cascade replication, you also need to evaluate the impact of data delay on the application system.