Last week in the production environment encountered a problem, do not dare to enjoy, take out to the small partners to do a simple share.
Cause: Due to the IDC room power failure (it is estimated where the excavator touched the next bar), causing all the server restart, affecting the MySQL database. Look at this time the database encountered problems:
Database version: MySQL 5.7.10
Problem performance
: Copy the following error from machine:Slave SQL for channel ”: Slave failed to initialize relay log info structure from the repository, Error_code: 1872
With inside's MySQL Standard profile template, how not to achieve crash safe? In fact, this is mainly due to multi-threaded replication (MTS). I do not know MySQL 5.7, even MySQL 5.6 will also encounter problems.
Under the MTS scenario, the following two problems may occur:
Gap transaction: Post-execution transactions first playback (apply)
Exec_master_log_pos location is inaccurate: there may be a transaction already committed, but the location has not been updated (single-threaded replication does not have this problem)
Gap transactions are better understood because, whether it is based on database-level MTS or on Logical_clock MTS, the following scenario may exist:
Because of MTS, the subsequent transaction may be executed earlier than the previous transaction, and the TX2 and tx4 of the previous figure may have been committed, but the transaction tx1 and TX3 have not yet been submitted. This is called gap business. In the MTS scenario based on Logical_clock, the user can slave_preserve_commit_order=1 ensure the order of submitting by configuration parameters.
On the other hand, Exec_master_log_pos is also inaccurate, and when crash occurs, Master info still records where the TX1 transaction begins (see the right part of the previous illustration). Remember, even if the parameter Slave_preserve_commit_order set to 1,MTS scene still cannot guarantee exec_master_log_pos is accurate, it is called Gap-free Low-watermark. Because the update of the table Slave_realy_info_log in the MTS scenario is not a transaction (this needs to be well understood).
However, a new transaction table Slave_worker_info was introduced in the MTS scenario to indicate where each thread was updated to when the outage occurred, and its playback with the worker thread was transactional. As a result, MySQL can be restored by comparing the exec_master_log_pos with the table slave_worker_info column Master_log_pos to determine whether the current transaction needs to be replayed.
Prior to MySQL version 5.7.13, the following actions need to be performed manually after the outage, which may trigger the 1872 errors if the change MASTER to operation is performed directly:
START SLAVE UNTIL sql_after_mts_gaps;
START SLAVE Sql_thread;
This problem is caused by the fact that the MySQL version on the server is 5.7.10, and the DBA tries to fix the replication problem by command change MASTER to. In the MySQL version 5.7.13, the above issues will have MySQL automatically repaired. Simply put, even if there is downtime, it is possible to accurately and automatically restore the running state of replication.
However, when the inside upgrade to the MySQL 5.7.15 process, and encountered a modest pit, this will be kept for the next time to share it.