* MHA the entire fault (offline) switching process
----------------------------------------------------------------------------------------------------
-Detects the status of the main library and confirms whether it crashes.
-Confirm service crash, save Binlog, push to master, and can force shut down main library to avoid brain fissure.
-Find out the latest data from the library (that is, the largest read_master_log_pos) and identify the new main library.
-From the latest relaylog that generate differences from the library, plus the unread binlog, apply to the new main library and note the offset.
-(concurrency) for other relaylog and binlog that generate differences from the library, applied to each slave library.
-Start copying from the library point at the offset from the new main library.
Source Part key Logic
--------------------------------------------------------------------------------------------------------
* * Read configuration
* * Check Configuration
-Check the version number of the Apply_diff_relay_log
-Connect all servers and read status (learn about the old Master library)
-Check that the parameters are passed in to crash the main library is consistent with the old Master repository address, or terminate the switchover
-Check if the old Main library is in the offline host list, and then stop switching
-Check if the MySQL service is really not connected
-Check all online from the library, all pointing to the Old Main library
-Check whether some should not ignore failed from library is offline
-Check if last switchover failed
-Check the time interval between the last switchover and the current switch, which is too short to terminate
-Get "toggle lock" from all libraries
-Ensure that all slave SQL threads from the library have been started
* * If Gtid auto location is supported but not enabled, then apply_diff_relay_log disable log_bin should be enforced??
* * Forced shutdown
-(concurrent) force stop all slave IO threads from the library
-Detect SSH accessibility from master to host where the crash main repository resides
-Perform master_ip_failover_script to ensure that the IP of the host that crashed the main library is inactivated to prevent brain cracking, otherwise terminate the switchover
-As long as there is an online salve IO thread from the library stops failing, then the switchover is terminated
* * Detects the smallest replication latency from the library, the replication latency is the largest from the library
* * Save the Binlog of the old master library based on the latest read header from the slave IO thread of the library.
-Lost Binlog (Read_master_log_pos to the tail) if the host that crashed the main library is not up to the same time
-If available, SSH connects up, then executes Save_binary_logs--command=save, copies the saved Binlog to the master, which is called Read_to_tail.
* * Based on the latest and oldest read headers from the library and some ignorable failures from the library to determine which base to use as Relaylog, Binlog compensation
-If all read headers from the library are consistent, skip
-ssh connect the latest from the library, perform apply_diff_relay_logs--command=find to see if Realylog contains the oldest read header from the library.
-If there is no benchmark to compensate from the library, terminate the switchover
* * Select New Main Library (new main library is not necessarily up-to-date from the library, refer to "online toggle" in the description)
* * Restore new Main Library
-If the new main library's read header is behind the latest from the library, then SSH connects on the latest from the library, performing Apply_diff_relay_logs--command=generate_and_send,
Extract the new Main library read head from the latest relaylog from the library until the most recent binary log from the library read header, this step is called Read_to_latest,
$latest _slave->{master_log_file}: $latest _slave->{read_master_log_pos}
-Copy the latest log (Read_to_tail) from the library read head to the Binlog tail of the main library to the new Main library.
-If you are not up-to-date from the library or have saved read_to_tail, then apply the diff log.
--first wait for the Relaylog already on the new main library to replay and stop slave SQL thread
--Read the latest replication status
--SSH executes Save_binary_logs--command=save, recovers exec_to_read from its own relaylog
--SSH executes the apply_diff_relay_logs--command=apply, importing all 3 of the previously generated compensation logs.
-Execute the Master_ip_failover--command=start script on the host computer to activate the IP of the new main library.
-Turn off read-only for the new main library and turn on writable mode.
* * Restore all from the library (similar to the process of recovering the main library separately)
-(concurrent) relay compensation, generating read_to_latest
-(concurrency) copies the read_to_tail portion of the earlier generation, copying it to each slave library, applying the diff log, pointing to the new main library, initiating replication
-New Main library to perform reset slave
*MHA Online Main Library switching process
--------------------------------------------------------------------------------------------------------------- -----------------------
Sudo/usr/bin/masterha_master_switch--master_state=alive--conf=/etc/masterha/app1.cnf--new_master_host= 192.168.128.130--new_master_port=3309--orig_mast\
Er_is_new_slave
* * Identify the old master library.
-Read configuration MHA configuration file;
-Connect and read all the database service status;
-(concurrent) connect all the slave libraries to see if the MySQL service is running, and if the machine is down, stop the switchover.
-Traverse each slave library to get all the information that can be obtained, such as: MSYQL service version number, whether Gtid is turned on, whether Log-bin is turned on,
Whether to read-only, copy related system variables, and state variables.
-Statistics Server information: Offline server, online server, online from library, failure from library, etc.
-Compare all MySQL service versions from the library to find the oldest and newest versions.
-Verify who is currently the real main library?
-the "Not_slave" tag in the online server is counted, only 1, otherwise the switchover process is terminated.
-based on the point from the library to find out which main libraries exist (support for the 3-tier replication structure (master-slave-slave)).
The real main library must be in "line and writable", and if there is not a master repository writable or there are two writable, then terminate the switchover.
-Determine if the switchover supports Gtid.
-Check all online from the library if there is a copy account and the corresponding replication slave permissions;
-Flush tables operation on the old Master Bank if necessary;
-Obtain "monitoring lock" from the old Main library;
-Get "toggle lock" from all libraries;
-Check all online copy health status from the library;
-Read the current replication status;
-Determine if there is a problem (IO, SQL thread is running, how long is the data delayed)
* * Identify the new main library.
-Identify the latest data from the library;
-Compare Master_log_file:read_master_log_pos.
-Select a new main library;
-Identify priority from the library, online and with candidate_master tags.
-Identify the libraries that should be ignored, with no_master tags, or not open log_bin, or the MySQL service version is not the oldest, compared to the latest from the library data latency is larger.
-Select Priority: Priority list, latest from library list, all from library list, but be sure to exclude ignore list.
-Check whether the copy filter rules of the new and Old Main library are consistent;
-binlog_do_db, binlog_ignore_db, replicate_do_table and so on.
* * refused to update to prevent brain fissure.
-Call the Master_ip_online_change script, stop subcommand. On the new main library, set to read-only;
On the Old Master Library, session-level log_bin are forbidden, gracefully waiting for all SQL threads to exit, set to read-only,
-If necessary, lock all tables in the Old master library and check if the Binlog has stopped moving forward.
Binlog Stop Moving forward, note the offset position.
* * Re-read all running status from the library online.
* * The new Master library applies all event logs from the old Master library.
-On the new main library, execute master_pos_wait, and then note the file:pos of the new Main library Binlog.
-Call Master_ip_online_change script, start. On the new main library, set to read-only.
* * (concurrency) applies all event logs from the library to the old master library and points to the new main library.
-master_pos_wait
-change_master_and_start_slave
MHA source code Analysis of high availability MySQL