Code parsing for MHA failover and online switching _php tutorial

Source: Internet
Author: User
Tags thread stop

Code parsing for MHA failover and online switching



Some time ago my colleague Shen Rongshing tidied up the code flow for MHA failover and online switching, and after obtaining their consent, forward it here. The following is the text

This article is based on MySQL5.5, so it does not involve gtid related content. MHA's master-slave switching process is divided into failover and rotate two, the former is applicable to the original master down case, the latter is in the case of online switching use. The following are explained separately



Process flow of failover


  1. Mha::masterfailover::main ()
  2. ->do_master_failover
  3. Phase 1:configuration Check Phase
  4. -Check_settings:
  5. Check_node_version: Viewing version information for MHA
  6. Connect_all_and_read_server_status: Verify that each node's MySQL instance can be connected
  7. Get_dead_servers/get_alive_servers/get_alive_slaves:double Check the status of each node
  8. START_SQL_THREADS_IF: Check if slave_sql_running is yes, if not start SQL thread

  9. Phase 2:dead Master Shutdown Phase: For us, the only function is the stop IO thread
  10. Force_shutdown ($dead _master):
  11. Stop_io_thread: All slave io thread stop off (will stop off master)
  12. Force_shutdown_internal (which is actually the execution of the master_ip_failover_script/shutdown_script in the configuration file, if none is performed):
  13. Master_ip_failover_script: If VIP is set, first switch VIP
  14. Shutdown_script: If the shutdown script is set, execute the

  15. Phase 3:master Recovery Phase
  16. Phase 3.1:getting Latest Slaves Phase (obtained Latest slave)
  17. Read_slave_status: Get Binlog of each slave file/position
  18. Check_slave_status: Call "SHOW slave status" to get the following information for slave:
  19. Slave_io_state, Master_host,
  20. Master_port, Master_user,
  21. Slave_io_running, Slave_sql_running,
  22. Master_log_file, Read_master_log_pos,
  23. Relay_master_log_file, Last_errno,
  24. Last_error, Exec_master_log_pos,
  25. Relay_log_file, Relay_log_pos,
  26. Seconds_behind_master, Retrieved_gtid_set,
  27. Executed_gtid_set, Auto_position
  28. replicate_do_db, replicate_ignore_db, Replicate_do_table,
  29. Replicate_ignore_table, Replicate_wild_do_table,
  30. Replicate_wild_ignore_table
  31. Identify_latest_slaves:
  32. Find latest's Slave by comparing the master_log_file/read_master_log_pos in each slave
  33. Identify_oldest_slaves:
  34. Find Oldest's slave by comparing the master_log_file/read_master_log_pos in each slave

  35. -Phase 3.2:saving Dead Master ' s Binlog Phase:
  36. Save_master_binlog:
  37. If dead master can ssh, go to the following branch:
  38. Save_master_binlog_internal: (Copy on Dead master using the node Save_binary_logs script)
  39. Save_binary_logs--command=save--start_file=mysql-bin.000281--start_pos=107--binlog_dir=/opt/mysql/data/binlog- -output_file=/opt/mha/log/saved_master_binlog_from_10.27.177.245_3306_20160108211857.binlog--handle_raw_binlog =1--disable_log_bin=0--manager_version=0.55
  40. Generate_diff_binary_log:
  41. Concat_all_binlogs_from:
  42. Dump_binlog: Dump the Binlog file to the target file, using the Binmode read
  43. Dump_binlog_header_fde: Read from 0 to Position-1
  44. Dump_binlog_from_pos: Starting from position, dump Binlog file to target file
  45. File_copy:
  46. A copy of the file is a copy of the Binlog file generated above to the Manager_workdir directory of the Manage node
  47. If dead master cannot ssh in, the TXN on master that is not synchronized to slave is lost

  48. -Phase 3.3:determining New Master Phase
  49. Find_latest_base_slave:
  50. Find_latest_base_slave_internal:
  51. POS_CMP ($oldest _mlf, $oldest _MLP, $latest _MLF, $latest _MLP)
  52. Determine if the binlog position of the latest/oldest slave is the same, if the same does not require synchronization relay log
  53. Apply_diff_relay_logs--command=find--latest
  54. See if there is a oldest missing relay log in the latest slave, and if none continues, failover fails
  55. Find the method is very simple, is the reverse reading latest slave relay log file, has been found file/position so far

  56. Select_new_master: Selecting a new master node
  57. If preferred node is specified, one of the active preferred nodes would be new master.
  58. If the latest server behinds too much (i.e. stopping SQL thread for online backups),
  59. We should not use it as a new master, we should fetch relay log there. Even though preferred
  60. Master is configured, it does not become a master if it's far behind.
  61. Get_candidate_masters:
  62. is the node in the configuration file that is configured with candidate_master>0
  63. Get_bad_candidate_masters:
  64. # The following servers can not be master:
  65. #-Dead Servers
  66. #-Set No_master in conf files (i.e. DR servers)
  67. #-Log_bin is disabled
  68. #-Major version is not the oldest
  69. #-Too much replication delay (slave and master binlog position gap greater than 100000000)
  70. Searching from Candidate_master slaves which has received the latest relay log events
  71. If not FOUND:
  72. Searching from all Candidate_master slaves
  73. If not FOUND:
  74. Searching from all slaves which has received the latest relay log events
  75. If not FOUND:
  76. Searching from all slaves

  77. -Phase 3.4:new Master Diff Log Generation Phase
  78. Recover_relay_logs:
  79. Determine if new master is latest slave, or if not, use apply_diff_relay_logs-command to generate a differential log,
  80. and send to the new master
  81. Recover_master_internal:
  82. Sends the Binlog on the Daed master generated in 3.2 to the new master

  83. -Phase 3.5:master Log Apply Phase
  84. Recover_slave:
  85. Apply_diff:
  86. 0. wait_until_relay_log_applied, wait for new master to complete Relaylog execution
  87. 1. Judge Exec_master_log_pos = = Read_master_log_pos,
  88. If not equal then use save_binary_logs--command=save to generate the differential log
  89. 2. Call the Apply_diff_relay_logs command to have new master recover. Where:
  90. The log of 2.1 recover is divided into three parts:
  91. Differential of Exec_diff:exec_master_log_pos and Read_master_log_pos
  92. Read_diff:new Master vs. lastest slave relay log differential
  93. Binlog difference between binlog_diff:lastest slave and Daed Master
  94. The apply_diff_relay_logs is actually called Mysqlbinlog command recover
  95. If the VIP is set, you need to call Master_ip_failover_script for VIP failover

  96. Phase 4:slaves Recovery Phase
  97. -Phase 4.1:starting Parallel Slave Diff Log Generation Phase
  98. Generate a diff log between slave and new slave, and copy the log to the working directory of each slave.

  99. -Phase 4.2:starting Parallel Slave Log Apply Phase
  100. Recover_slave:
  101. Recovery of each slave, with Phase3.5
  102. Change_master_and_start_slave:
  103. Use the Change Master to command to point these slave to the new master and finally start copying (start slave)

  104. Phase 5:new Master Cleanup Phase
  105. Reset_slave_on_new_master
  106. Cleaning up new master is simply resetting slave info, which cancels the original slave information. At this point the entire master failover process is complete


The process of rotate

  1. Mha::masterrotate::main ()
    -Do_master_online_switch:
    Phase 1:configuration Check Phase
    Identify_orig_master
    Connect_all_and_read_server_status:
    Connect_check: First connect check to ensure that the MySQL services of each server are normal
    Connect_and_get_status: Gets the server_id/mysql_version/log_bin of the MySQL instance. and other information
    This step also has an important role in getting the current master node. By executing show slave status,
    If the output is empty, the current node is the master node.
    Validate_current_master: Obtain the master node information and determine the correctness of the configuration
    Check if there is a server down and exit rotate if any
    Check master alive or not, if dead exits rotate
    Check_repl_priv:
    To see if a user has replication permissions
    Get Monitor_advisory_lock to ensure that no other monitor processes are currently running on master
    Execution: SELECT get_lock (' Mha_master_high_availability_monitor ',?) As Value
    Get Failover_advisory_lock to ensure that no other failover processes are currently running on slave
    Execution: SELECT get_lock (' Mha_master_high_availability_failover ',?) As Value
    Check_replication_health:
    Execution: SHOW SLAVE status to determine the following states: Current_slave_position/has_replication_problem
    Among them, Has_replication_problem specific check the following: IO thread/sql thread/seconds_behind_master (1s)
    Get_running_update_threads:
    Use show processlist to query whether there are currently any threads that perform the update, and if so, exit switch
    Identify_new_master
    Set_latest_slaves: The current slave node is latest slave
    Select_new_master: Selecting a new master node
    If preferred node is specified, one of the active preferred nodes would be new master.
    If the latest server behinds too much (i.e. stopping SQL thread for online backups),
    We should not use it as a new master, we should fetch relay log there. Even though preferred
    Master is configured, it does not become a master if it's far behind.
    Get_candidate_masters:
    is the node in the configuration file that is configured with candidate_master>0
    Get_bad_candidate_masters:
    # The following servers can not be master:
    #-Dead Servers
    #-Set No_master in conf files (i.e. DR servers)
    #-Log_bin is disabled
    #-Major version is not the oldest
    #-Too much replication delay (slave and master binlog position gap greater than 100000000)
    Searching from Candidate_master slaves which has received the latest relay log events
    If not FOUND:
    Searching from all Candidate_master slaves
    If not FOUND:
    Searching from all slaves which has received the latest relay log events
    If not FOUND:
    Searching from all slaves

    Phase 2:rejecting Updates Phase
    Reject_update:lock table to reject write Binlog
    If the "Master_ip_online_change_script" parameter is set in the MHA configuration file, the script is executed disable writes on the current master
    The script needs to be set when the VIP is used.
    Reconnect: Ensure that the current connection to master is normal
    Lock_all_tables: Execute flush tables with READ lock to lock table
    Check_binlog_stop: Show master Status twice in succession to determine if write Binlog has stopped

    Read_slave_status:
    Get_alive_slaves:
    Check_slave_status: Call "SHOW slave status" to get the following information for slave:
    Slave_io_state, Master_host,
    Master_port, Master_user,
    Slave_io_running, Slave_sql_running,
    Master_log_file, Read_master_log_pos,
    Relay_master_log_file, Last_errno,
    Last_error, Exec_master_log_pos,
    Relay_log_file, Relay_log_pos,
    Seconds_behind_master, Retrieved_gtid_set,
    Executed_gtid_set, Auto_position
    replicate_do_db, replicate_ignore_db, Replicate_do_table,
    Replicate_ignore_table, Replicate_wild_do_table,
    Replicate_wild_ignore_table
    Switch_master:
    Switch_master_internal:
    Master_pos_wait: Call the Select Master_pos_wait function, waiting for master-slave synchronization to complete
    Get_new_master_binlog_position: Execute ' Show Master status '
    Allow write access to the new master:
    Call Master_ip_online_change_script--command=start ... to point the VIP to new master
    Disable_read_only:
    Execute on new master: SET GLOBAL read_only=0
    Switch_slaves:
    Switch_slaves_internal:
    Change_master_and_start_slave
    Change_master:
    Start_slave:
    Unlock_tables: Executes unlock table on orig Master
    Phase 5:new Master Cleanup Phase
    Reset_slave_on_new_master
    Release_failover_advisory_lock

http://www.bkjia.com/PHPjc/1119676.html www.bkjia.com true http://www.bkjia.com/PHPjc/1119676.html techarticle MHA failover and online switching code parsing some time ago my colleague Shen Rongshing collated the code flow for MHA failover and online switching, and forwarded it after obtaining their consent. To have a .

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.