MHA Online Switching process

Source: Internet
Author: User
Tags failover

MHA on-line switchover is MHA in addition to automatic monitoring switching provided by another way, more for such as hardware upgrades, MySQL database migration and so on. This way provides fast switching and graceful blocking writes, regardless of shutting down the original server, the entire switching process at 0.5-2s time around, greatly reducing the down time. Online Master switch starts only when all of the following conditions are met:

1. IO threads   on all slaves is running//run on all slave IO threads. 2. SQL threads on all slaves is running  //sql thread is running properly on all slave. 3. Seconds_behind_master on all slaves is less or equal than--running_updates_limit Seconds  //In all slaves seconds_b Ehind_master to be less than or equal to  running_updates_limit seconds 4. On master, none of update queries take more than--running_updates_limit seconds in the show processlist output  //At the Lord , there are no update query operations more than running_updates_limit seconds on show processlist output results.

The reasons for these limitations are for security reasons, and to switch to the new main library as soon as possible.

1. Verify that Masterha_manager is currently enabled (recommended to stop)

[Email protected] app2]# masterha_check_status--conf=/data/masterha/app1/app1.cnfapp1 (pid:6769) is running (0:ping_ OK), Master:192.168.0.50[[email protected] app2]#

2. Verifying the slave io_threads, Sql_threads, Seconds_behind_master

[Email protected] masterha]$ mysql-uroot-p123456-h192.168.0.60-e ' show slave status \g ' |grep-e ' slave_io_running| Slave_sql_running| Seconds_behind_master "warning:using a password on the command line interface can be insecure.             Slave_io_running:yes            slave_sql_running:yes        seconds_behind_master:0      slave_sql_running_state:slave Have read all relay log; Waiting for the slave I/O thread to update it[[email protected] masterha]$

3. Implement online switching

[[email protected] masterha]# masterha_master_switch--conf=/data/masterha/app1/app1.cnf--master_state=alive --new_master_host=192.168.0.60--orig_master_is_new_slave--running_updates_limit=10000--interactive=0Sat Jul 16 09:11:00-[INFO] mha::masterrotate version 0.56.Sat Jul 09:11:00-[info] starting online master switch: Sat Jul 09:11:00-[INFO] Sat 09:11:00-[INFO] * Phase 1:configuration Check Phase.  Sat Jul 09:11:00-[INFO] Sat 09:11:00-[WARNING] Global configuration file/etc/masterha_default.cnf Not found. Skipping.sat Jul 09:11:00-[INFO] Reading application default configuration from/data/masterha/app1/app1.cnf. Sat Jul 09:11:00-[INFO] Reading server configuration from/data/masterha/app1/app1.cnf. Sat Jul 09:11:00-[INFO] GTID Failover mode = 0Sat Jul 09:11:00-[INFO] current Alive master:192.168.0. 192.168.0.50:3306 Sat 09:11:00-[info] Alive Slaves:satJul 09:11:00-[info] 192.168.0.60 (192.168.0.60:3306) version=5.6.29-log (oldest major Version between slaves) Log-bin:enabledsat Jul 09:11:00-[info] replicating from 192.168.0.50 (192.168.0.50:3306) Sat 16 09:11:00 2  016-[INFO] Primary candidate for the new Master (Candidate_master is set) Sat Jul 09:11:00-[INFO] Executing FLUSH No_write_to_binlog TABLES. This could take a long time. Sat Jul 09:11:00-[info] OK. Sat Jul 09:11:00-[info] Checking MHA is not monitoring or doing failover.Sat Jul 09:11:00-[error][/usr/share/perl5/vendor_perl/mha/masterrotate.pm, ln142] Getting advisory Lock FAI Led on the current master. MHA Monitor runs on the current master. Stop MHA Manager/monitor and try again. Sat Jul 09:11:00-[error][/usr/share/perl5/vendor_perl/mha/managerutil.pm, ln177] Got error:at/usr/bin/masterh A_master_switch Line 53[[Email protected] masterha]#Stop the MHA and test[Email protected] masterha]# masterha_stop--conf=/data/masterha/app1/app1.cnfstopped App1 successfully. [2]-Exit 1 nohup masterha_manager--conf=/data/masterha/app1/app1.cnf 2>&1(WD:/DATA/MASTERHA/APP2) (wd now:/DATA/MASTERHA) [[email protected] masterha]#

4. Re-implement online switchover

[[email protected] masterha]# masterha_master_switch--conf=/data/masterha/app1/app1.cnf--master_state=alive --new_master_host=192.168.0.60--orig_master_is_new_slave--running_updates_limit=10000--interactive=0Sat Jul 16 09:15:03-[INFO] mha::masterrotate version 0.56.Sat Jul 09:15:03-[info] starting online master switch: Sat Jul 09:15:03-[INFO] Sat 09:15:03-[INFO] * Phase 1:configuration Check Phase.  Sat Jul 09:15:03-[INFO] Sat 09:15:03-[WARNING] Global configuration file/etc/masterha_default.cnf Not found. Skipping.sat Jul 09:15:03-[INFO] Reading application default configuration from/data/masterha/app1/app1.cnf. Sat Jul 09:15:03-[INFO] Reading server configuration from/data/masterha/app1/app1.cnf. Sat Jul 09:15:03-[INFO] GTID Failover mode = 0Sat Jul 09:15:03-[INFO] current Alive master:192.168.0. 192.168.0.50:3306 Sat 09:15:03-[info] Alive Slaves:satJul 09:15:03-[info] 192.168.0.60 (192.168.0.60:3306) version=5.6.29-log (oldest major Version between slaves) Log-bin:enabledsat Jul 09:15:03-[info] replicating from 192.168.0.50 (192.168.0.50:3306) Sat 16 09:15:03 2  016-[INFO] Primary candidate for the new Master (Candidate_master is set) Sat Jul 09:15:03-[INFO] Executing FLUSH No_write_to_binlog TABLES. This could take a long time. Sat Jul 09:15:03-[info] OK. Sat Jul 09:15:03-[info] Checking MHA is not monitoring or doing failover. Sat Jul 09:15:03-[INFO] Checking replication Health on 192.168.0.60..Sat Jul 09:15:03-[info] OK. Sat Jul 09:15:03-[info] 192.168.0.60 can be new master. Sat Jul 09:15:03-[info] from:192.168.0.50 (192.168.0.50:3306) (current master) +--192.168.0.60 ( 192.168.0.60:3306) to:192.168.0.60 (192.168.0.60:3306) (new master) +--192.168.0.50 (192.168.0.50:3306) Sat 16 09:15:03-[INFO] Checking whether 192.168.0.60 (192.168.0.60:3306) is OK for the new master. Sat Jul 09:15:03-[info] OK. Sat Jul 09:15:03-[info] 192.168.0.50 (192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing change MASTER to a dummy host. Sat Jul 09:15:03-[info] 192.168.0.50 (192.168.0.50:3306): Resetting slave pointing to the dummy host. Sat Jul 09:15:03-[INFO] * * Phase 1:configuration Check Phase completed. Sat Jul 09:15:03-[INFO] Sat 09:15:03-[INFO] * Phase 2:rejecting updates Phase. Sat Jul 09:15:03-[INFO] Sat 09:15:03-[WARNING] master_ip_online_change_script is not defined. Skipping disabling writes on the current master. Sat Jul 09:15:03-[info] Locking all tables on the orig master to reject updates from everybody (including root): Sat Jul 09:15:03-[info] executing FLUSH TABLES with READ LOCK. Sat Jul 09:15:03-[info] OK. Sat Jul 16 09:15:03 2016-[INFO] Orig Master Binlog:pos is Mysql-bin.000009:40355591.sat Jul 09:15:03 – [info] waiting to execute all relay logs O n 192.168.0.60 (192.168.0.60:3306): Sat Jul 09:15:03-[info] master_pos_wait (mysql-bin.000009:40355591) completed on 192.168.0.60 (192.168.0.60:3306 ). Executed 0 events. Sat Jul 09:15:03-[info] done. Sat Jul 09:15:03-[info] Getting new master ' s binlog name and position. Sat Jul 09:15:03-[info] Mysql-bin.000006:120sat Jul 09:15:03-[info] All other slaves should start re Plication from here. Statement should be:change MASTER to master_host= ' 192.168.0.60 ', master_port=3306, master_log_file= ' mysql-bin.000006 ' , master_log_pos=120, master_user= ' repl ', master_password= ' xxx '; Sat Jul 09:15:03-[INFO] Sat 09:15:03-[INFO] * Switching slaves in parallel. Sat Jul 09:15:03-[INFO] Sat 09:15:03-[INFO] Unlocking all tables on the orig Master:sat Jul 16 09: 15:03-[INFO] ExecuTing UNLOCK TABLES. Sat Jul 09:15:03-[info] OK. Sat Jul 09:15:03-[info] starting orig Master as a new slave. Sat Jul 09:15:03-[info] resetting slave 192.168.0.50 (192.168.0.50:3306) and starting replication from the new M Aster 192.168.0.60 (192.168.0.60:3306).Sat Jul 09:15:03-[info] Executed change MASTER. Sat Jul 09:15:14-[error][/usr/share/perl5/vendor_perl/mha/server.pm, ln784] Slave could not being started on 192.16 8.0.50 (192.168.0.50:3306)! Check slave status. Sat Jul 09:15:14-[error][/usr/share/perl5/vendor_perl/mha/server.pm, ln862] starting slave io/sql thread on 192. 168.0.50 (192.168.0.50:3306) failed! Sat Jul 09:15:14-[error][/usr/share/perl5/vendor_perl/mha/masterrotate.pm, ln573] failed! Sat Jul 09:15:14-[error][/usr/share/perl5/vendor_perl/mha/masterrotate.pm, ln602] switching master to 192.168.0 . (192.168.0.60:3306) did, but switching slaves partially failed. [Email protected] masterha]# 

The log judgment of the master and slave machine itself may be caused by the non-mapping of IP and hostname in the master-slave machine. Modify hosts

Host/etc/hosts127.0.0.1 MyDB01/etc/hosts127.0.0.1 MyDB02 modified from the machine/etc/hosts[[email protected] ~]# more/etc/ hosts192.168.0.60  MyDB02192.168.0.50  MyDB01

The previous operation was completely successful, resulting in two machines being a dual master architecture. Manual switchover is adjusted to the original schema one from the primary one. Do a check before switching online:

[Email protected] app1]# MASTERHA_CHECK_REPL--conf=/data/masterha/app1/app1.cnfsat Jul 10:24:49-[WARNING] Glo Bal configuration file/etc/masterha_default.cnf not found. Skipping.sat Jul 10:24:49-[INFO] Reading application default configuration from/data/masterha/app1/app1.cnf. Sat Jul 10:24:49-[INFO] Reading server configuration from/data/masterha/app1/app1.cnf. Sat Jul 10:24:49-[INFO] mha::mastermonitor version 0.56.Sat Jul 10:24:49-[INFO] GTID Failover mode = 0 Sat Jul 10:24:49-[info] Dead Servers:sat Jul 10:24:49-[info] Alive Servers:Sat Jul 10:24:49-[info] 192.168.0.50 (192.168.0.50:3306) Sat 10:24:49-[INFO] 192.168.0.60 (19 2.168.0.60:3306)Sat Jul 10:24:49-[info] Alive Slaves:sat Jul 10:24:49-[info] 192.168.0.60 (192.168.0.60:3306) Versio N=5.6.29-log (oldest major version between slaves) Log-bin:enabledsat Jul 10:24:49-[INFO] replicating from 1  92.168.0.50 (192.168.0.50:3306) Sat 10:24:49-[INFO] Primary candidate for the new Master (Candidate_master is set)Sat Jul 10:24:49-[INFO] current Alive master:192.168.0.50 (192.168.0.50:3306)Sat Jul 10:24:49-[info] Checking slave configurations. Sat Jul 10:24:49-[info] Read_only=1 is isn't set on slave 192.168.0.60 (192.168.0.60:3306). Sat Jul 10:24:49-[INFO] Checking replication filtering settings. Sat Jul 10:24:49-[info] binlog_do_db=, binlog_ignore_db= sat 10:24:49-[INFO] Replication Filter ing check OK. Sat Jul 10:24:49-[INFO] GTID (with Auto-pos) are not Supportedsat Jul 10:24:49-[INFO] starting SSH conn Ection tests. Sat Jul 10:24:50-[INFO] All SSH connection tests passed successfully. Sat Jul 10:24:50-[info] Checking MHA Node version. Sat Jul 10:24:51-[info] Version check OK. Sat Jul 10:24:51-[info] Checking SSH publickey authentication settings on the current master. Sat Jul 10:24:51-[info] healthcheck:ssh to 192.168.0.50 is reachable. Sat Jul 10:24:51-[info] Master MHA Node version is 0.56.Sat Jul 10:24:51-[info] Checking Recovery script configurations on 192.168.0.50 (192.168.0.50:3306). Sat Jul 10:24:51-[info] executing command:save_binary_logs--command=test--start_pos=4--binlog_dir=/data/my Sql/3306/binlog--output_file=/data/masterha/app1/save_binary_logs_test--manager_version=0.56--start_file=   mysql-bin.000010 Sat Jul 10:24:51-[info] connecting to [email protected] (192.168.0.50:22):    Creating/data/masterha/app1 if not exists.  Ok.   Checking output directory is accessible or not:  Ok. Binlog found At/data/mysql/3306/binlog, up to Mysql-bin.000010sat Jul 10:24:52-[info] Binlog setting check done .  Sat Jul 10:24:52-[info] Checking SSH publickey authentication and Checking recovery script configurations on all Alive slave servers. Sat Jul 10:24:52-[info] executing command:apply_diff_relay_logs--command=test--slave_user= ' root '--slave_h ost=192.168.0.60--slave_ip=192.168.0.60--slave_port=3306--workdir=/data/masterha/app1--target_vErsion=5.6.29-log--manager_version=0.56--relay_log_info=/data/mysql/3306/data/relay-log.info--relay_dir=/data/ mysql/3306/data/--slave_pass=xxxsat Jul 10:24:52-[info] connecting to [email protected] (192.168.0.60:22   )..    Checking Slave recovery Environment settings.    Opening/data/mysql/3306/data/relay-log.info ... ok. Relay Log found At/data/mysql/3306/binlog, up to relay-bin.000002 temporary Relay log file is/data/mysql/3306/binlog/ RELAY-BIN.000002 testing MySQL connection and privileges.    Done. Testing Mysqlbinlog Output:    Done. Cleaning up test file (s): Done. Sat Jul 10:24:53-[INFO] slaves settings check done. Sat Jul 10:24:53-[info] 192.168.0.50 (192.168.0.50:3306) (current master) +--192.168.0.60 (192.168.0.60:3306) SAT Jul 10:24:53-[INFO] Checking replication Health on 192.168.0.60..Sat Jul 10:24:53-[info] OK. Sat Jul 10:24:53-[WARNING] master_ip_failover_script is not defined. Sat Jul 16 10:24:-[WARNING] shutdown_script is not defined. Sat Jul 10:24:53-[INFO] Got exit code 0 (not master dead).MySQL Replication Health is OK.

5. Implement the Switchover

[Email protected] app1]# masterha_master_switch--conf=/data/masterha/app1/app1.cnf--master_state=alive--new_ master_host=192.168.0.60--orig_master_is_new_slave--running_updates_limit=10000--interactive=0Sat Jul 16 10:26:59 * [INFO] mha::masterrotate version 0.56.Sat Jul 10:26:59-[info] starting online master switch: Sat Jul 10:26:59-[INFO]Sat Jul 10:26:59-[INFO] * Phase 1:configuration Check Phase.Sat Jul 10:26:59-[INFO] Sat 10:26:59-[WARNING] Global configuration file/etc/masterha_default.cnf Not found. Skipping.sat Jul 10:26:59-[INFO] Reading application default configuration from/data/masterha/app1/app1.cnf. Sat Jul 10:26:59-[INFO] Reading server configuration from/data/masterha/app1/app1.cnf. Sat Jul 10:26:59-[INFO] GTID Failover mode = 0Sat Jul 10:26:59-[INFO] current Alive master:192.168.0. 192.168.0.50:3306 Sat 10:26:59-[info] Alive Slaves:sat Jul 10:26:59-[info] 192.168.0.60 (192.1     68.0.60:3306) Version=5.6.29-log (oldest major Version between slaves) Log-bin:enabledsat Jul 10:26:59-[INFO] Replicating from 192.168.0.50 (192.168.0.50:3306) Sat. 10:26:59-[INFO] Primary candidate for the new Mas ter (Candidate_master is set) Sat Jul 10:26:59-[info] executing FLUSH no_write_to_binlog TABLES. This could take a long time. Sat Jul 10:26:59-[iNFO] OK. Sat Jul 10:26:59-[info] Checking MHA is not monitoring or doing failover. Sat Jul 10:26:59-[INFO] Checking replication Health on 192.168.0.60..Sat Jul 10:26:59-[info] OK. Sat Jul 10:26:59-[info] 192.168.0.60 can be new master. Sat Jul 10:26:59-[info] from:192.168.0.50 (192.168.0.50:3306) (current master) +--192.168.0.60 ( 192.168.0.60:3306) to:192.168.0.60 (192.168.0.60:3306) (new master) +--192.168.0.50 (192.168.0.50:3306) Sat 16 10:26:59-[INFO] Checking whether 192.168.0.60 (192.168.0.60:3306) is OK for the new master. Sat Jul 10:26:59-[info] OK. Sat Jul 10:26:59-[info] 192.168.0.50 (192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing change MASTER to a dummy host. Sat Jul 10:26:59-[info] 192.168.0.50 (192.168.0.50:3306): Resetting slave pointing to the dummy host.Sat Jul 10:26:59-[INFO] * * Phase 1:configuration Check Phase completed.Sat Jul 10:26:59-[INFO]Sat Jul 10:26:59-[INFO] * Phase 2:rejecting updates Phase.Sat Jul 10:26:59-[INFO] Sat 10:26:59-[WARNING] master_ip_online_change_script is not defined. Skipping disabling writes on the current master. Sat Jul 10:26:59-[info] Locking all tables on the orig master to reject updates from everybody (including root): Sat Jul 10:26:59-[info] executing FLUSH TABLES with READ LOCK. Sat Jul 10:26:59-[info] OK. Sat Jul 10:26:59-[info] Orig master Binlog:pos is Mysql-bin.000010:120.sat Jul 10:26:59-[INFO] Waitin G to execute all relay logs on 192.168.0.60 (192.168.0.60:3306): Sat Jul 10:27:00-[info] master_pos_wait (mysql-bin.000010:120) completed on 192.168.0.60 (192.168.0.60:3306). Executed 0 events. Sat Jul 10:27:00-[info] done. Sat Jul 10:27:00-[info] Getting new master ' s binlog name and position. Sat Jul 10:27:00-[info] Mysql-bin.000008:239sat Jul 10:27:00-[info] All other slaves should start re Plication from here. Statement ShoUld be:change MASTER to master_host= ' 192.168.0.60 ', master_port=3306, master_log_file= ' mysql-bin.000008 ', Master_log_ pos=239, master_user= ' repl ', master_password= ' xxx ';  Sat Jul 10:27:00-[INFO]Sat Jul 10:27:00-[INFO] * Switching slaves in parallel.Sat Jul 10:27:00-[INFO] Sat 10:27:00-[INFO] Unlocking all tables on the orig Master:sat Jul 16 10: 27:00-[INFO] executing UNLOCK TABLES. Sat Jul 10:27:00-[info] OK. Sat Jul 10:27:00-[info] starting orig Master as a new slave. Sat Jul 10:27:00-[info] resetting slave 192.168.0.50 (192.168.0.50:3306) and starting replication from the new M Aster 192.168.0.60 (192.168.0.60:3306). Sat Jul 10:27:00-[info] Executed change MASTER. Sat Jul 10:27:00-[info] Slave started. Sat Jul 10:27:00-[info] all new slave servers switched successfully. Sat Jul 10:27:00-[INFO]Sat Jul 10:27:00-[INFO] * Phase 5:new master cleanup Phase.Sat Jul 10:27:00-[INFO] Sat 10:27:00-[INFO] 192.168.0.60:resetting slave info succeeded. Sat Jul 10:27:00-[info] Switching master to 192.168.0.60 (192.168.0.60:3306) completed successfully. [Email protected] app1]#

After switching, note the order of the master and slave servers in the configuration file (--conf=/data/masterha/app1/app1.cnf), MHA may be the default first server to master

MHA Online Switching process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.