Code parsing for MHA failover and online switching

Code parsing for MHA failover and online switching _php tutorial

Last Update:2016-07-12 Source: Internet

Author: User

Tags thread stop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Code parsing for MHA failover and online switching

Some time ago my colleague Shen Rongshing tidied up the code flow for MHA failover and online switching, and after obtaining their consent, forward it here. The following is the text

This article is based on MySQL5.5, so it does not involve gtid related content. MHA's master-slave switching process is divided into failover and rotate two, the former is applicable to the original master down case, the latter is in the case of online switching use. The following are explained separately

Process flow of failover

Mha::masterfailover::main ()
->do_master_failover
Phase 1:configuration Check Phase
-Check_settings:
Check_node_version: Viewing version information for MHA
Connect_all_and_read_server_status: Verify that each node's MySQL instance can be connected
Get_dead_servers/get_alive_servers/get_alive_slaves:double Check the status of each node
START_SQL_THREADS_IF: Check if slave_sql_running is yes, if not start SQL thread
Phase 2:dead Master Shutdown Phase: For us, the only function is the stop IO thread
Force_shutdown ($dead _master):
Stop_io_thread: All slave io thread stop off (will stop off master)
Force_shutdown_internal (which is actually the execution of the master_ip_failover_script/shutdown_script in the configuration file, if none is performed):
Master_ip_failover_script: If VIP is set, first switch VIP
Shutdown_script: If the shutdown script is set, execute the
Phase 3:master Recovery Phase
Phase 3.1:getting Latest Slaves Phase (obtained Latest slave)
Read_slave_status: Get Binlog of each slave file/position
Check_slave_status: Call "SHOW slave status" to get the following information for slave:
Slave_io_state, Master_host,
Master_port, Master_user,
Slave_io_running, Slave_sql_running,
Master_log_file, Read_master_log_pos,
Relay_master_log_file, Last_errno,
Last_error, Exec_master_log_pos,
Relay_log_file, Relay_log_pos,
Seconds_behind_master, Retrieved_gtid_set,
Executed_gtid_set, Auto_position
replicate_do_db, replicate_ignore_db, Replicate_do_table,
Replicate_ignore_table, Replicate_wild_do_table,
Replicate_wild_ignore_table
Identify_latest_slaves:
Find latest's Slave by comparing the master_log_file/read_master_log_pos in each slave
Identify_oldest_slaves:
Find Oldest's slave by comparing the master_log_file/read_master_log_pos in each slave
-Phase 3.2:saving Dead Master ' s Binlog Phase:
Save_master_binlog:
If dead master can ssh, go to the following branch:
Save_master_binlog_internal: (Copy on Dead master using the node Save_binary_logs script)
Save_binary_logs--command=save--start_file=mysql-bin.000281--start_pos=107--binlog_dir=/opt/mysql/data/binlog- -output_file=/opt/mha/log/saved_master_binlog_from_10.27.177.245_3306_20160108211857.binlog--handle_raw_binlog =1--disable_log_bin=0--manager_version=0.55
Generate_diff_binary_log:
Concat_all_binlogs_from:
Dump_binlog: Dump the Binlog file to the target file, using the Binmode read
Dump_binlog_header_fde: Read from 0 to Position-1
Dump_binlog_from_pos: Starting from position, dump Binlog file to target file
File_copy:
A copy of the file is a copy of the Binlog file generated above to the Manager_workdir directory of the Manage node
If dead master cannot ssh in, the TXN on master that is not synchronized to slave is lost
-Phase 3.3:determining New Master Phase
Find_latest_base_slave:
Find_latest_base_slave_internal:
POS_CMP ($oldest _mlf, $oldest _MLP, $latest _MLF, $latest _MLP)
Determine if the binlog position of the latest/oldest slave is the same, if the same does not require synchronization relay log
Apply_diff_relay_logs--command=find--latest
See if there is a oldest missing relay log in the latest slave, and if none continues, failover fails
Find the method is very simple, is the reverse reading latest slave relay log file, has been found file/position so far
Select_new_master: Selecting a new master node
If preferred node is specified, one of the active preferred nodes would be new master.
If the latest server behinds too much (i.e. stopping SQL thread for online backups),
We should not use it as a new master, we should fetch relay log there. Even though preferred
Master is configured, it does not become a master if it's far behind.
Get_candidate_masters:
is the node in the configuration file that is configured with candidate_master>0
Get_bad_candidate_masters:
# The following servers can not be master:
#-Dead Servers
#-Set No_master in conf files (i.e. DR servers)
#-Log_bin is disabled
#-Major version is not the oldest
#-Too much replication delay (slave and master binlog position gap greater than 100000000)
Searching from Candidate_master slaves which has received the latest relay log events
If not FOUND:
Searching from all Candidate_master slaves
If not FOUND:
Searching from all slaves which has received the latest relay log events
If not FOUND:
Searching from all slaves
-Phase 3.4:new Master Diff Log Generation Phase
Recover_relay_logs:
Determine if new master is latest slave, or if not, use apply_diff_relay_logs-command to generate a differential log,
and send to the new master
Recover_master_internal:
Sends the Binlog on the Daed master generated in 3.2 to the new master
-Phase 3.5:master Log Apply Phase
Recover_slave:
Apply_diff:
0. wait_until_relay_log_applied, wait for new master to complete Relaylog execution
1. Judge Exec_master_log_pos = = Read_master_log_pos,
If not equal then use save_binary_logs--command=save to generate the differential log
2. Call the Apply_diff_relay_logs command to have new master recover. Where:
The log of 2.1 recover is divided into three parts:
Differential of Exec_diff:exec_master_log_pos and Read_master_log_pos
Read_diff:new Master vs. lastest slave relay log differential
Binlog difference between binlog_diff:lastest slave and Daed Master
The apply_diff_relay_logs is actually called Mysqlbinlog command recover
If the VIP is set, you need to call Master_ip_failover_script for VIP failover
Phase 4:slaves Recovery Phase
-Phase 4.1:starting Parallel Slave Diff Log Generation Phase
Generate a diff log between slave and new slave, and copy the log to the working directory of each slave.
-Phase 4.2:starting Parallel Slave Log Apply Phase
Recover_slave:
Recovery of each slave, with Phase3.5
Change_master_and_start_slave:
Use the Change Master to command to point these slave to the new master and finally start copying (start slave)
Phase 5:new Master Cleanup Phase
Reset_slave_on_new_master
Cleaning up new master is simply resetting slave info, which cancels the original slave information. At this point the entire master failover process is complete

The process of rotate

Mha::masterrotate::main ()
-Do_master_online_switch:
Phase 1:configuration Check Phase
Identify_orig_master
Connect_all_and_read_server_status:
Connect_check: First connect check to ensure that the MySQL services of each server are normal
Connect_and_get_status: Gets the server_id/mysql_version/log_bin of the MySQL instance. and other information
This step also has an important role in getting the current master node. By executing show slave status,
If the output is empty, the current node is the master node.
Validate_current_master: Obtain the master node information and determine the correctness of the configuration
Check if there is a server down and exit rotate if any
Check master alive or not, if dead exits rotate
Check_repl_priv:
To see if a user has replication permissions
Get Monitor_advisory_lock to ensure that no other monitor processes are currently running on master
Execution: SELECT get_lock (' Mha_master_high_availability_monitor ',?) As Value
Get Failover_advisory_lock to ensure that no other failover processes are currently running on slave
Execution: SELECT get_lock (' Mha_master_high_availability_failover ',?) As Value
Check_replication_health:
Execution: SHOW SLAVE status to determine the following states: Current_slave_position/has_replication_problem
Among them, Has_replication_problem specific check the following: IO thread/sql thread/seconds_behind_master (1s)
Get_running_update_threads:
Use show processlist to query whether there are currently any threads that perform the update, and if so, exit switch
Identify_new_master
Set_latest_slaves: The current slave node is latest slave
Select_new_master: Selecting a new master node
If preferred node is specified, one of the active preferred nodes would be new master.
If the latest server behinds too much (i.e. stopping SQL thread for online backups),
We should not use it as a new master, we should fetch relay log there. Even though preferred
Master is configured, it does not become a master if it's far behind.
Get_candidate_masters:
is the node in the configuration file that is configured with candidate_master>0
Get_bad_candidate_masters:
# The following servers can not be master:
#-Dead Servers
#-Set No_master in conf files (i.e. DR servers)
#-Log_bin is disabled
#-Major version is not the oldest
#-Too much replication delay (slave and master binlog position gap greater than 100000000)
Searching from Candidate_master slaves which has received the latest relay log events
If not FOUND:
Searching from all Candidate_master slaves
If not FOUND:
Searching from all slaves which has received the latest relay log events
If not FOUND:
Searching from all slaves

Phase 2:rejecting Updates Phase
Reject_update:lock table to reject write Binlog
If the "Master_ip_online_change_script" parameter is set in the MHA configuration file, the script is executed disable writes on the current master
The script needs to be set when the VIP is used.
Reconnect: Ensure that the current connection to master is normal
Lock_all_tables: Execute flush tables with READ lock to lock table
Check_binlog_stop: Show master Status twice in succession to determine if write Binlog has stopped

Read_slave_status:
Get_alive_slaves:
Check_slave_status: Call "SHOW slave status" to get the following information for slave:
Slave_io_state, Master_host,
Master_port, Master_user,
Slave_io_running, Slave_sql_running,
Master_log_file, Read_master_log_pos,
Relay_master_log_file, Last_errno,
Last_error, Exec_master_log_pos,
Relay_log_file, Relay_log_pos,
Seconds_behind_master, Retrieved_gtid_set,
Executed_gtid_set, Auto_position
replicate_do_db, replicate_ignore_db, Replicate_do_table,
Replicate_ignore_table, Replicate_wild_do_table,
Replicate_wild_ignore_table
Switch_master:
Switch_master_internal:
Master_pos_wait: Call the Select Master_pos_wait function, waiting for master-slave synchronization to complete
Get_new_master_binlog_position: Execute ' Show Master status '
Allow write access to the new master:
Call Master_ip_online_change_script--command=start ... to point the VIP to new master
Disable_read_only:
Execute on new master: SET GLOBAL read_only=0
Switch_slaves:
Switch_slaves_internal:
Change_master_and_start_slave
Change_master:
Start_slave:
Unlock_tables: Executes unlock table on orig Master
Phase 5:new Master Cleanup Phase
Reset_slave_on_new_master
Release_failover_advisory_lock

http://www.bkjia.com/PHPjc/1119676.html www.bkjia.com true http://www.bkjia.com/PHPjc/1119676.html techarticle MHA failover and online switching code parsing some time ago my colleague Shen Rongshing collated the code flow for MHA failover and online switching, and forwarded it after obtaining their consent. To have a .



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More