"MySQL" "High Availability" simple analysis of MHA switching logic from the Masterha_master_switch tool

Source: Internet
Author: User
Tags failover log log

Summary: Masterha_master_switch is integrated as a switch tool in the MHA package,

Install: After compiling and installing MHA Manager, binary executable program Masterha_master_switch will be generated in/usr/local/bin/.

Use:

$masterha_master_switch --helpUsage:    # For master failover    masterha_master_switch --master_state=dead    --global_conf=/etc/masterha_default.cnf    --conf=/usr/local/masterha/conf/app1.cnf --dead_master_host=host1    # For online master switch    masterha_master_switch --master_state=alive    --global_conf=/etc/masterha_default.cnf    --conf=/usr/local/masterha/conf/app1.cnf    See online reference    (http://code.google.com/p/mysql-master-ha/wiki/masterha_master_switch)    for details.

Here, I'm used to putting a master-slave configuration in app1.cnf and changing it to a business-related name, such as Mainbusiness.cnf

Analysis:

目标:获取masterha_master_switch的在线切换逻辑环境:MHA manager 192.168.1.8       MHA node1+MySQL5.7+GTID 192.168.1.109+PORT3109     主       MHA node1+MySQL5.7+GTID 192.168.1.110+PORT3110     从

Configuration file Contents:

Manager_workdir=/data/mha/mainbusiness #设置MHA的工作目录manager_log =/data/mha/mainbusiness/manager.log # MHA Manager's log output remote_workdir=/data/mha/#预设MHA node-side working directory master_binlog_dir=/data/mysql/310                                                                                       9/log/,/data/mysql/3110/log/#预设MHA The Binlog directory on the node side                                                                    #secondary_check_script = Masterha_secondary_check-s 192.168.1.109-s 192.168.1.110 secondary_check_script= masterha_secondary_check-s 192.168.1.109-s 192.168.                                       1.110--user=root--master_host=192.168.1.109--master_port=3109 ping_interval=1 Detection interval for #设置MHA Manager (1 seconds) [server1]hostname=mysql-cent7-ip001109ip=192.168.1.109port=3109ssh_user=rootssh_port=                                    22candidate_master=1 #设置该节点是否可以提升为主, 1 Yes, 0 no check_repl_delay=0 #发生故障后是否检查本实例主从落后程度, 0 No,1 is [server2]hostname=mysql-cent7-ip001110ip=192.168.1.110port=3110ssh_user=rootssh_port=22candidate_master=1    #设置该节点是否可以提升为主, 1 is yes, 0 no check_repl_delay=0 #发生故障后是否检查本实例主从落后程度, 0 no, 1 is Execute on MHA Manager: $masterha _master_switch--master_state=alive--conf=/etc/mha/mainbusiness.cnf--orig_master_is_ New_slave#--master_state indicates that the online switch #--orig_master_is_new_slave specifies that the original master be hung from the library to the new Lord

MHA Manager-side output is as follows

#################### #输出段1 ###########################[info] mha::masterrotate version 0.57. [INFO]        Starting online master switch: #开始在线切换 [INFO] [INFO] * Phase 1:configuration Check Phase. #阶段1, check the configuration [info] [WARNING] Global configuration file/etc/masterha_default.cnf not found.             Skipping.               #我这里没有使用全局参数文件, there will be error skipped, it's OK [info] Reading application default configuration from/etc/mha/mainbusiness.cnf. #程序从mainBusiness文件中读取配置 [INFO] Reading server configuration From/etc/mha/mainbusi Ness.cnf. [INFO]  GTID failover mode = 1 #启用GTID故障转移模式 [INFO] current Alive master:mysql-cent7-ip001109 (192.168.1.109:3109) #列出当前存活的主实例 [INFO] Alive Slaves: #列出当前存活的从实例 [info] mysql-cent7-ip001110 (192.168.1.110:3110) Vers Ion=5.7.19-log (oldest major version between slaves) Log-bin:enabled[info] GTID on #从采用了GTID mode [INFO] replicating from 192.168.1.109 (192.168.1.109:3109) [info] Primary candidate for the new Master (Candidate_master is set) It's better to execute FLUSH no_write_to_binlog On the master before switching. Is it OK to execute on mysql-cent7-ip001109 (192.168.1.109:3109)? (yes/no): #处于事务一致性考虑, the program asks whether to temporarily close the non-transactional table

The original master,3109 end is displayed as follows

#####################输出段1###########################SET wait_timeout=86400;                                #设置连接超时时间,防止切换时翻车SELECT @@global.server_id As Value;                    SELECT VERSION() AS Value;                             #获取用于复制的server—idSELECT @@global.gtid_mode As Value;                    #获取自身是否使用了GTID复制SHOW GLOBAL VARIABLES LIKE ‘log_bin‘;                  #检查自身是否启用了binlogSHOW MASTER STATUS;                                    #获取自身的事务执行情况SELECT @@global.datadir AS Value;                      #获取自身数据文件的存储位置SELECT @@global.slave_parallel_workers AS Value;       #检查是否采用了多线程复制SHOW SLAVE STATUS;                                     #获取自身作为从库时的事务执行情况SELECT @@global.read_only As Value;                    #获取自身是否开启了只读SELECT @@global.relay_log_purge As Value;              #检查自身是否开启了relay log自动清除

The original slave,3110 end is displayed as follows

SELECT @@global.server_id As Value;SELECT VERSION() AS Value;SELECT @@global.gtid_mode As Value;SHOW GLOBAL VARIABLES LIKE ‘log_bin‘;SHOW MASTER STATUS;SELECT @@global.datadir AS Value;SELECT @@global.slave_parallel_workers AS Value;SHOW SLAVE STATUS;SELECT @@global.read_only As Value;SELECT @@global.relay_log_purge As Value;SELECT @@global.relay_log_info_repository AS Value;     #差异处,获取自身relay信息保存形式(table)SELECT Relay_log_name FROM mysql.slave_relay_log_info;  #差异处,获取正在使用的relay文件名称SELECT @@global.datadir AS Value;                       SHOW SLAVE STATUS;SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = ‘repl‘;  #差异处,检查复制用户是否具有复制权限

The first part summarizes:

        读取配置文件,确认主从关系与复制方式;        根据主从关系复制方式,连接主库:设置必要参数,获取的复制详细信息/                            连接从库:获取复制的详细信息,获取relay信息,获取repl账号并确认权限

After the MHA manager side output confirms the input Yes

The MHA manager side output is as follows:

#################### #输出段2 ###########################[info] executing FLUSH no_write_to_binlog TABLES. This could take a long time.               [INFO] OK.  #注意, this shows that the non-transactional table is closed, but there is no related statement in the above grab statement [info] Checking MHA is not monitoring or doing failover. #检查MHA是否工作, the switch requires MHA Manager to stop running [info] Checking replication Health on mysql-cent7-ip001110. #检查主从健康程度 [INFO] OK. [INFO]                    Searching new master from slaves. #开始在从库中选取一个新主 [INFO] candidate Masters from the configuration file: #列出候选从库 [info] mysql-cent7-ip001109 (192.168.1.1   09:3109) Version=5.7.19-log Log-bin:enabled[info] GTID on #检查GTID开启情况 [INFO] mysql-cent7-ip001110 (192.168.1.110:3110) version=5.7.19-log (oldest major Version between slaves) log-bin:enabled[ Info] GTID on #检查GTID开启情况 [info] replicating from 192.168.1.109 (192.168.1. 109:3109) [INFO] Primary candidate for the new Master (Candidate_master is set) [info] non-Candidate Masters:[info] searching from candidate_master slaves which has received the latest relay log events: [INFO] #在所有从库中选取relay log The latest one as the new Main library from:mysql-cent7-ip001109 (192.168.1.109:3109) (current master) +-- mysql-cent7-ip001110 (192.168.1.110:3110) to:mysql-cent7-ip001110 (192.168.1.110:3110) (new master) +-- MYSQL-CENT7-IP001109 (192.168.1.109:3109) starting master switch from mysql-cent7-ip001109 (192.168.1.109:3109) to mysql-cent7-ip001110 (192.168.1.110:3110)? (yes/no): #程序询问是否可以进行切换了

The master,3109 end is displayed as follows

#####################输出段2###########################USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES;                                         -- 锁非事务表SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value;   -- 加一个模拟锁,防止出现切换程序多开的问题,这样多开的程序以为获取不到同名锁就会失败退出,#GET_LOCK(str,time) -- str为锁的名称,0表示持续锁,5.7.5之后可以存在多个名称不同的GET_LOCKSHOW PROCESSLIST;

The original slave,3110 end is displayed as follows

#####################输出段2###########################USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;

The second part summarizes:

        1.确认可以保存关闭非事务表后,关闭非事务表,加模拟锁防止切换程序多开造成切换异常        2.从候选从库中选出拥有最新数据的从库,并将其设为要切换到的主库

After the MHA manager-side output confirms that it can be switched

The MHA manager side output is as follows:

#####################输出段3###########################[info] Checking whether MySQL-Cent7-IP001110(192.168.1.110:3110) is ok for the new master..[info]  ok.   #检查上一步中被选定的新主库的是否真正可以成为主[info] MySQL-Cent7-IP001109(192.168.1.109:3109): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.          #没有检查到原来主(现在被作为从)作为从库的残留信息,不管3721先将其挂到一个空主上[info] MySQL-Cent7-IP001109(192.168.1.109:3109): Resetting slave pointing to the dummy host.[info] ** Phase 1: Configuration Check Phase completed.[info] # 配置检查阶段结束[info] * Phase 2: Rejecting updates Phase..[info] #master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO):#切换程序检测到主机虚拟IP地址切换的地址没有定义,如果只切换主从身份的话,应用还会写到原来的主上,需要设置只读。询问是否真的要切换,这里我们只是做切换实验,先观察下两个实例的输出,然后直接切换即可。

The original master,3109 end is displayed as follows

USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES;                                         #强制保存并关闭非事务型表,防止事务不一致SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value;   SHOW PROCESSLIST;SHOW SLAVE STATUS;CHANGE MASTER TO MASTER_HOST=‘dummy_host‘;                               #将原主作为从切换到一个莫须有的主机上SHOW SLAVE STATUS;RESET SLAVE /*!50516 ALL */;                                             #尝试清除自己之前残存的slave属性的信息,若本机不为主时,需要!SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Monitor‘) As Value;    

The original slave,3110 end is displayed as follows

USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;SHOW PROCESSLIST;

The third part summarizes:

    1.检查选定的从库的详细信息,确认真的是否可以作为新的主库    2.处理从库之间的数据比对,若只有一个主和从,则将原主库先挂到一个莫须有的空主上                            若有两个以上的从库,则需清理从库原来的slave记录

MHA Manager-Side output confirmation, after forced switchover:

The MHA manager side output is as follows:

#################### #输出段4 ###########################[info] Locking all tables on the orig master to reject updates from EV Erybody (including root): [info] executing FLUSH TABLES with READ LOCK.   [INFO] OK. #所有的表都禁止写操作 [INFO] Orig master Binlog:pos is 3109binlog.000070:536.  [INFO] Waiting to execute all relay logs on mysql-cent7-ip001110 (192.168.1.110:3110): [INFO] master_pos_wait (3109binlog.000070:536) completed on mysql-cent7-ip001110 (192.168.1.110:3110). Executed 0 events. [INFO] done. #将从库已经获得, but the transaction has not yet been executed to itself, and the other has not been followed from the library [info] Getting new master ' s binlog name and position. [INFO] 3110binlog.000049:536[info] All and slaves should start replication from here. Statement should be:change MASTER to master_host= ' mysql-cent7-ip001110 or 192.168.1.110 ', master_port=3110, Master_auto _position=1, master_user= ' repl ', master_password= ' xxx '; [INFO] Setting read_only=0 on mysql-cent7-ip001110 (192.168.1.110:3110).   [INFO] OK. #即将所有其他从库都用change Master's statement switches to the new master, which is the original from 110 [info] [INFO] * Switching slaves in parallel. [INFO] #即将并行切换所有从库 [INFO] Unlocking all tables on the orig master: [info] executing UNLOCK tables.   [INFO] OK. #释放原主库的表锁 [INFO] starting orig Master as a new slave.  [INFO] resetting slave mysql-cent7-ip001109 (192.168.1.109:3109) and starting replication from the new master mysql-cent7-ip001110 (192.168.1.110:3110):  [INFO] Executed change MASTER.  [INFO] Slave started. [INFO] All new slave servers switched successfully.        [INFO] [INFO] * Phase 5:new master cleanup Phase. [INFO] #即将清理新从库的主从信息, switch to main library [info] mysql-cent7-ip001110:resetting slave info succeeded. [INFO]              Switching master to mysql-cent7-ip001110 (192.168.1.110:3110) completed successfully. #切换完成

The new slave,3109 end appears as follows:

#####################输出段4###########################USE ``;SELECT CONNECTION_ID() AS Value;SET wait_timeout=86400;SET GLOBAL read_only=1;                           #将自己设为只读SHOW MASTER STATUS;UNLOCK TABLES;                                    #释放表锁CHANGE MASTER TO MASTER_HOST=‘192.168.1.110‘, MASTER_PORT=3110, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘123456‘, MASTER_AUTO_POSITION=1; #把自己指向MHA选定的新主START SLAVE;SHOW SLAVE STATUS;                                #差异处,启动slave身份SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Failover‘) As Value;   #释放锁止同应用的并发锁

The new master,3110 end appears as follows:

#################### #输出段4 ########################## #SHOW SLAVE STATUS;      SELECT master_pos_wait (' 3109binlog.000070 ', ' 536 ', 0) as Result;                                              #检查自己是否执行完差异事务STOP SLAVE Sql_thread;                                                  #停止从库SQL线程SHOW SLAVE STATUS;                                                 SHOW MASTER STATUS;                                 SELECT @ @global. read_only as Value;                                 SELECT @ @global. read_only as Value;                                             SET GLOBAL read_only=0; #取消只读状态USE ';                                            SELECT Unix_timestamp (); SELECT @ @GLOBAL.                                          server_id;                          SET @master_heartbeat_period = 30000001024;              SET @master_binlog_checksum = @ @global. binlog_checksum;                                     SELECT @master_binlog_checksum; SELECT @ @GLOBAL.                                          Gtid_mode; SELECT @@GLOBAL.                                        Server_uuid;            SET @slave_uuid = ' 28ea40ab-9bbd-11e7-8cd1-000c29c31069 ';                                                         #写入从库的UUIDSTOP SLAVE;                                              #停止自身作为从库的身份SHOW SLAVE STATUS;                                        RESET SLAVE/*!50516 all */; #清除自身作为从库的所有记录SHOW SLAVE STATUS; SELECT release_lock (' Mha_master_high_availability_failover ') as Value; #释放锁止同应用的并发锁

Part IV summarizes:

    1.检查选定的新主实例relay日志是否存在未执行完的原主事务,并应用到新主    原主库:锁止,记录主从信息,释放锁,将自己指向新的主,启动从库身份    新主库:  自己执行完未来得及执行完的事务后停止自己从库身份,取消只读,清除自身作为从的记录    最后释放切换程序的并发锁

"MySQL" "High Availability" simple analysis of MHA switching logic from the Masterha_master_switch tool

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.