MHA source code Analysis of high availability MySQL

Last Update:2016-09-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

* MHA the entire fault (offline) switching process

----------------------------------------------------------------------------------------------------
-Detects the status of the main library and confirms whether it crashes.
-Confirm service crash, save Binlog, push to master, and can force shut down main library to avoid brain fissure.
-Find out the latest data from the library (that is, the largest read_master_log_pos) and identify the new main library.
-From the latest relaylog that generate differences from the library, plus the unread binlog, apply to the new main library and note the offset.
-(concurrency) for other relaylog and binlog that generate differences from the library, applied to each slave library.
-Start copying from the library point at the offset from the new main library.

Source Part key Logic

--------------------------------------------------------------------------------------------------------
* * Read configuration
* * Check Configuration
-Check the version number of the Apply_diff_relay_log
-Connect all servers and read status (learn about the old Master library)
-Check that the parameters are passed in to crash the main library is consistent with the old Master repository address, or terminate the switchover
-Check if the old Main library is in the offline host list, and then stop switching
-Check if the MySQL service is really not connected
-Check all online from the library, all pointing to the Old Main library
-Check whether some should not ignore failed from library is offline
-Check if last switchover failed
-Check the time interval between the last switchover and the current switch, which is too short to terminate
-Get "toggle lock" from all libraries
-Ensure that all slave SQL threads from the library have been started
* * If Gtid auto location is supported but not enabled, then apply_diff_relay_log disable log_bin should be enforced??
* * Forced shutdown
-(concurrent) force stop all slave IO threads from the library
-Detect SSH accessibility from master to host where the crash main repository resides
-Perform master_ip_failover_script to ensure that the IP of the host that crashed the main library is inactivated to prevent brain cracking, otherwise terminate the switchover
-As long as there is an online salve IO thread from the library stops failing, then the switchover is terminated

* * Detects the smallest replication latency from the library, the replication latency is the largest from the library
* * Save the Binlog of the old master library based on the latest read header from the slave IO thread of the library.
-Lost Binlog (Read_master_log_pos to the tail) if the host that crashed the main library is not up to the same time
-If available, SSH connects up, then executes Save_binary_logs--command=save, copies the saved Binlog to the master, which is called Read_to_tail.
* * Based on the latest and oldest read headers from the library and some ignorable failures from the library to determine which base to use as Relaylog, Binlog compensation
-If all read headers from the library are consistent, skip
-ssh connect the latest from the library, perform apply_diff_relay_logs--command=find to see if Realylog contains the oldest read header from the library.
-If there is no benchmark to compensate from the library, terminate the switchover
* * Select New Main Library (new main library is not necessarily up-to-date from the library, refer to "online toggle" in the description)
* * Restore new Main Library
-If the new main library's read header is behind the latest from the library, then SSH connects on the latest from the library, performing Apply_diff_relay_logs--command=generate_and_send,
Extract the new Main library read head from the latest relaylog from the library until the most recent binary log from the library read header, this step is called Read_to_latest,
$latest _slave->{master_log_file}: $latest _slave->{read_master_log_pos}
-Copy the latest log (Read_to_tail) from the library read head to the Binlog tail of the main library to the new Main library.
-If you are not up-to-date from the library or have saved read_to_tail, then apply the diff log.
--first wait for the Relaylog already on the new main library to replay and stop slave SQL thread
--Read the latest replication status
--SSH executes Save_binary_logs--command=save, recovers exec_to_read from its own relaylog
--SSH executes the apply_diff_relay_logs--command=apply, importing all 3 of the previously generated compensation logs.
-Execute the Master_ip_failover--command=start script on the host computer to activate the IP of the new main library.
-Turn off read-only for the new main library and turn on writable mode.
* * Restore all from the library (similar to the process of recovering the main library separately)
-(concurrent) relay compensation, generating read_to_latest
-(concurrency) copies the read_to_tail portion of the earlier generation, copying it to each slave library, applying the diff log, pointing to the new main library, initiating replication
-New Main library to perform reset slave

*MHA Online Main Library switching process

--------------------------------------------------------------------------------------------------------------- -----------------------
Sudo/usr/bin/masterha_master_switch--master_state=alive--conf=/etc/masterha/app1.cnf--new_master_host= 192.168.128.130--new_master_port=3309--orig_mast\
Er_is_new_slave

* * Identify the old master library.
-Read configuration MHA configuration file;

-Connect and read all the database service status;
-(concurrent) connect all the slave libraries to see if the MySQL service is running, and if the machine is down, stop the switchover.
-Traverse each slave library to get all the information that can be obtained, such as: MSYQL service version number, whether Gtid is turned on, whether Log-bin is turned on,
Whether to read-only, copy related system variables, and state variables.
-Statistics Server information: Offline server, online server, online from library, failure from library, etc.
-Compare all MySQL service versions from the library to find the oldest and newest versions.
-Verify who is currently the real main library?
-the "Not_slave" tag in the online server is counted, only 1, otherwise the switchover process is terminated.
-based on the point from the library to find out which main libraries exist (support for the 3-tier replication structure (master-slave-slave)).
The real main library must be in "line and writable", and if there is not a master repository writable or there are two writable, then terminate the switchover.
-Determine if the switchover supports Gtid.

-Check all online from the library if there is a copy account and the corresponding replication slave permissions;
-Flush tables operation on the old Master Bank if necessary;
-Obtain "monitoring lock" from the old Main library;
-Get "toggle lock" from all libraries;
-Check all online copy health status from the library;
-Read the current replication status;
-Determine if there is a problem (IO, SQL thread is running, how long is the data delayed)

* * Identify the new main library.
-Identify the latest data from the library;
-Compare Master_log_file:read_master_log_pos.
-Select a new main library;
-Identify priority from the library, online and with candidate_master tags.
-Identify the libraries that should be ignored, with no_master tags, or not open log_bin, or the MySQL service version is not the oldest, compared to the latest from the library data latency is larger.
-Select Priority: Priority list, latest from library list, all from library list, but be sure to exclude ignore list.
-Check whether the copy filter rules of the new and Old Main library are consistent;
-binlog_do_db, binlog_ignore_db, replicate_do_table and so on.

* * refused to update to prevent brain fissure.
-Call the Master_ip_online_change script, stop subcommand. On the new main library, set to read-only;
On the Old Master Library, session-level log_bin are forbidden, gracefully waiting for all SQL threads to exit, set to read-only,
-If necessary, lock all tables in the Old master library and check if the Binlog has stopped moving forward.
Binlog Stop Moving forward, note the offset position.

* * Re-read all running status from the library online.

* * The new Master library applies all event logs from the old Master library.
-On the new main library, execute master_pos_wait, and then note the file:pos of the new Main library Binlog.
-Call Master_ip_online_change script, start. On the new main library, set to read-only.

* * (concurrency) applies all event logs from the library to the old master library and points to the new main library.
-master_pos_wait
-change_master_and_start_slave

MHA source code Analysis of high availability MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MHA source code Analysis of high availability MySQL

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support