MySQL multi-threaded replication encounters Error_code: 1872 solution, mysqlerror_code
I encountered a problem in the production environment last week and did not dare to share it with my friends.
Cause: due to a power failure in the IDC room (it is estimated that the excavator has touched it), all servers are restarted, affecting the MySQL database. Let's take a look at the database problems:
Database Version: MySQL 5.7.10
Problem Performance
: The following error is reported during slave replication:Slave SQL for channel ”: Slave failed to initialize relay log info structure from the repository, Error_code: 1872
Why didn't we implement crash safe with the standard MySQL configuration file template of Inside Jun? In fact, this is mainly caused by multi-threaded replication (MTS. I do not know MySQL 5.7, even MySQL 5.6 will also encounter problems.
In the MTS scenario, the following two problems may occur:
Gap transaction: the transaction to be executed is played back (apply) first.
Exec_Master_Log_Pos location is incorrect: the transaction may have been committed but the location has not been updated (this problem does not exist for single-thread replication)
Gap transactions are easy to understand, because both database-level and logical_clock-based MTS may have the following scenarios:
Due to MTS, subsequent transactions may be executed earlier than the previous transactions. For example, transactions tx2 and tx4 may have been committed, but transactions tx1 and tx3 have not yet been committed. In this case, there is a gap transaction. In the MTS scenario based on logical_clock, you can configure Parametersslave_preserve_commit_order=1To ensure the order of submission.
On the other hand, Exec_Master_Log_Pos is not accurate at this time. When a crash occurs, the master info still records the starting position of the tx1 transaction (see the part on the right ). Remember, even if the slave_preserve_commit_order parameter is set to 1, the accuracy of Exec_Master_Log_Pos cannot be ensured in MTS scenarios, which is called gap-free low-watermark. In the MTS scenario, updating the table slave_realy_info_log is not a transaction (this requires a good understanding ).
However, in the MTS scenario, a new transaction table slave_worker_info is introduced to indicate the location where each thread is updated in the event of a crash. Its playback with the Worker thread is a transaction. Therefore, when restoring MySQL, you can use Exec_Master_Log_Pos to compare it with the Master_log_pos column in The slave_worker_info table to determine whether to play back the current transaction.
Before MySQL 5.7.13, you must manually perform the following operations when the server goes down. If you directly perform the change master to operation, the above 1872 error may be triggered:
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;START SLAVE SQL_THREAD;
Because the MySQL version on the server is 5.7.10, the DBA tries TO solve the replication problem by running the command change master. In MySQL 5.7.13, MySQL automatically fixes the above problems. Simply put, even if a server goes down, the replication operation status can be restored accurately and automatically.
However, when Inside was upgraded to MySQL 5.7.15, it encountered a big pitfall. Let's keep it for the next time.