Mha gtid based failover code parsing
As a supplement to the following article, it describes the processing process of mha gtid based failover.
Http://blog.chinaunix.net/uid-20726500-id-5700631.html
MHA determines that GTID based failover must meet the following three conditions (refer to the get_gtid_status function)
All nodes gtid_mode = 1
Executed_Gtid_Set of all nodes is not empty
At least one node Auto_Position = 1
GTID basedMHA failover
- MHA: MasterFailover: main ()
- -> Do_master_failover
- Phase 1: Configuration Check Phase
- -> Check_settings:
- Check_node_version: View MHA version information
- Connect_all_and_read_server_status: Check whether the MySQL instances of each node can be connected.
- Get_dead_servers/get_alive_servers/get_alive_slaves: double check the status of each node
- Start_ SQL _threads_if: Check whether Slave_ SQL _Running is Yes. If not, start SQL thread
- Phase 2: Dead Master Shutdown Phase: for us, the only function is to stop IO thread
- -> Force_shutdown ($ dead_master ):
- Stop_io_thread: stop all slave IO threads (stop master)
- Force_shutdown_internal (in fact, it is to execute master_ip_failover_script/shutdown_script in the configuration file. If not, it will not be executed ):
- Master_ip_failover_script: If the VIP is set, switch the VIP first.
- Shutdown_script: If the shutdown script is set, run
- Phase 3: Master Recovery Phase
- -> Phase 3.1: Getting Latest Slaves Phase (obtain latest slave)
- Read_slave_status: obtains the binlog file/position of each slave.
- Check_slave_status: Call "show slave status" to obtain the following slave information:
- Slave_IO_State, Master_Host,
- Master_Port, Master_User,
- Slave_IO_Running, Slave_ SQL _Running,
- Master_Log_File, Read_Master_Log_Pos,
- Relay_Master_Log_File, Last_Errno,
- Last_Error, Exec_Master_Log_Pos,
- Relay_Log_File, Relay_Log_Pos,
- Seconds_Behind_Master, Retrieved_Gtid_Set,
- Executed_Gtid_Set, Auto_Position
- Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
- Replicate_Ignore_Table, Replicate_Wild_Do_Table,
- Replicate_Wild_Ignore_Table
- Identify_latest_slaves:
- Compare Master_Log_File/Read_Master_Log_Pos in each slave to find the latest slave
- Identify_oldest_slaves:
- Compare Master_Log_File/Read_Master_Log_Pos in each slave to find the oldest slave
- -> PHP 3.2: Determining New Master Phase
- Get_most_advanced_latest_slave: Find the top Slave (Relay_Master_Log_File, Exec_Master_Log_Pos)
- Select_new_master: selects a new master node.
- If preferred node is specified, one of active preferred nodes will be new master.
- If the latest server behinds too much (I. e. stopping SQL thread for online backups ),
- We shocould not use it as a new master, we shocould fetch relay log there. Even though preferred
- Master is configured, it does not become a master if it's far behind.
Get_candidate_masters:
Is the node configured with candidate_master> 0 in the configuration file.
Get_bad_candidate_masters:
# The following servers can not be master:
#-Dead servers
#-Set no_master in conf files (I. e. DR servers)
#-Log_bin is disabled
#-Major version is not the oldest
#-Too much replication delay (the binlog position difference between slave and master is greater than 100000000)
Searching from candidate_master slaves which have received the latest relay log events
If not found:
Searching from all candidate_master slaves
If not found:
Searching from all slaves which have stored ed the latest relay log events
If not found:
Searching from all slaves
-> Phase 3.3: Phase 3.3: New Master Recovery Phase
Recover_master_gtid_internal:
Wait_until_relay_log_applied
Stop_slave
If the new master is not an Slave with the latest relay
$ Latest_slave-> wait_until_relay_log_applied: wait until the newest relay Slave has Exec_Master_Log_Pos equal to Read_Master_Log_Pos
Change_master_and_start_slave ($ target, $ latest_slave)
Wait_until_in_sync ($ target, $ latest_slave)
Save_from_binlog_server:
Traverse all binary servers and run save_binary_logs -- command = save to obtain the binlog
Apply_binlog_to_master:
Binlog obtained by the application from binary server (if any)
If master_ip_failover_script is set, call $ master_ip_failover_script -- command = start to enable vip.
If skip_disable_read_only is not set, set read_only = 0.
Phase 4: Slaves Recovery Phase
Recover_slaves_gtid_internal
-> Phase 4.1: Starting Slaves in parallel
Run change_master_and_start_slave on all Slave instances.
If wait_until_gtid_in_sync is set, use "SELECT WAIT_UNTIL_ SQL _THREAD_AFTER_GTIDS (?, 0) "waiting for Slave Data Synchronization
Phase 5: New master cleanup phase
Reset_slave_on_new_master
Clearing the New Master is actually resetting the slave info, that is, canceling the original Slave information. So far, the entire Master failover process has been completed.
The online switching process when GTID is enabled is the same as that when GTID is not enabled (the only difference is that the change master statement is executed), so it is omitted.