Summary: Masterha_master_switch is integrated as a switch tool in the MHA package,
Install: After compiling and installing MHA Manager, binary executable program Masterha_master_switch will be generated in/usr/local/bin/.
Use:
$masterha_master_switch --helpUsage: # For master failover masterha_master_switch --master_state=dead --global_conf=/etc/masterha_default.cnf --conf=/usr/local/masterha/conf/app1.cnf --dead_master_host=host1 # For online master switch masterha_master_switch --master_state=alive --global_conf=/etc/masterha_default.cnf --conf=/usr/local/masterha/conf/app1.cnf See online reference (http://code.google.com/p/mysql-master-ha/wiki/masterha_master_switch) for details.
Here, I'm used to putting a master-slave configuration in app1.cnf and changing it to a business-related name, such as Mainbusiness.cnf
Analysis:
目标:获取masterha_master_switch的在线切换逻辑环境:MHA manager 192.168.1.8 MHA node1+MySQL5.7+GTID 192.168.1.109+PORT3109 主 MHA node1+MySQL5.7+GTID 192.168.1.110+PORT3110 从
Configuration file Contents:
Manager_workdir=/data/mha/mainbusiness #设置MHA的工作目录manager_log =/data/mha/mainbusiness/manager.log # MHA Manager's log output remote_workdir=/data/mha/#预设MHA node-side working directory master_binlog_dir=/data/mysql/310 9/log/,/data/mysql/3110/log/#预设MHA The Binlog directory on the node side #secondary_check_script = Masterha_secondary_check-s 192.168.1.109-s 192.168.1.110 secondary_check_script= masterha_secondary_check-s 192.168.1.109-s 192.168. 1.110--user=root--master_host=192.168.1.109--master_port=3109 ping_interval=1 Detection interval for #设置MHA Manager (1 seconds) [server1]hostname=mysql-cent7-ip001109ip=192.168.1.109port=3109ssh_user=rootssh_port= 22candidate_master=1 #设置该节点是否可以提升为主, 1 Yes, 0 no check_repl_delay=0 #发生故障后是否检查本实例主从落后程度, 0 No,1 is [server2]hostname=mysql-cent7-ip001110ip=192.168.1.110port=3110ssh_user=rootssh_port=22candidate_master=1 #设置该节点是否可以提升为主, 1 is yes, 0 no check_repl_delay=0 #发生故障后是否检查本实例主从落后程度, 0 no, 1 is Execute on MHA Manager: $masterha _master_switch--master_state=alive--conf=/etc/mha/mainbusiness.cnf--orig_master_is_ New_slave#--master_state indicates that the online switch #--orig_master_is_new_slave specifies that the original master be hung from the library to the new Lord
MHA Manager-side output is as follows
#################### #输出段1 ###########################[info] mha::masterrotate version 0.57. [INFO] Starting online master switch: #开始在线切换 [INFO] [INFO] * Phase 1:configuration Check Phase. #阶段1, check the configuration [info] [WARNING] Global configuration file/etc/masterha_default.cnf not found. Skipping. #我这里没有使用全局参数文件, there will be error skipped, it's OK [info] Reading application default configuration from/etc/mha/mainbusiness.cnf. #程序从mainBusiness文件中读取配置 [INFO] Reading server configuration From/etc/mha/mainbusi Ness.cnf. [INFO] GTID failover mode = 1 #启用GTID故障转移模式 [INFO] current Alive master:mysql-cent7-ip001109 (192.168.1.109:3109) #列出当前存活的主实例 [INFO] Alive Slaves: #列出当前存活的从实例 [info] mysql-cent7-ip001110 (192.168.1.110:3110) Vers Ion=5.7.19-log (oldest major version between slaves) Log-bin:enabled[info] GTID on #从采用了GTID mode [INFO] replicating from 192.168.1.109 (192.168.1.109:3109) [info] Primary candidate for the new Master (Candidate_master is set) It's better to execute FLUSH no_write_to_binlog On the master before switching. Is it OK to execute on mysql-cent7-ip001109 (192.168.1.109:3109)? (yes/no): #处于事务一致性考虑, the program asks whether to temporarily close the non-transactional table
The original master,3109 end is displayed as follows
#####################输出段1###########################SET wait_timeout=86400; #设置连接超时时间,防止切换时翻车SELECT @@global.server_id As Value; SELECT VERSION() AS Value; #获取用于复制的server—idSELECT @@global.gtid_mode As Value; #获取自身是否使用了GTID复制SHOW GLOBAL VARIABLES LIKE ‘log_bin‘; #检查自身是否启用了binlogSHOW MASTER STATUS; #获取自身的事务执行情况SELECT @@global.datadir AS Value; #获取自身数据文件的存储位置SELECT @@global.slave_parallel_workers AS Value; #检查是否采用了多线程复制SHOW SLAVE STATUS; #获取自身作为从库时的事务执行情况SELECT @@global.read_only As Value; #获取自身是否开启了只读SELECT @@global.relay_log_purge As Value; #检查自身是否开启了relay log自动清除
The original slave,3110 end is displayed as follows
SELECT @@global.server_id As Value;SELECT VERSION() AS Value;SELECT @@global.gtid_mode As Value;SHOW GLOBAL VARIABLES LIKE ‘log_bin‘;SHOW MASTER STATUS;SELECT @@global.datadir AS Value;SELECT @@global.slave_parallel_workers AS Value;SHOW SLAVE STATUS;SELECT @@global.read_only As Value;SELECT @@global.relay_log_purge As Value;SELECT @@global.relay_log_info_repository AS Value; #差异处,获取自身relay信息保存形式(table)SELECT Relay_log_name FROM mysql.slave_relay_log_info; #差异处,获取正在使用的relay文件名称SELECT @@global.datadir AS Value; SHOW SLAVE STATUS;SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = ‘repl‘; #差异处,检查复制用户是否具有复制权限
The first part summarizes:
读取配置文件,确认主从关系与复制方式; 根据主从关系复制方式,连接主库:设置必要参数,获取的复制详细信息/ 连接从库:获取复制的详细信息,获取relay信息,获取repl账号并确认权限
After the MHA manager side output confirms the input Yes
The MHA manager side output is as follows:
#################### #输出段2 ###########################[info] executing FLUSH no_write_to_binlog TABLES. This could take a long time. [INFO] OK. #注意, this shows that the non-transactional table is closed, but there is no related statement in the above grab statement [info] Checking MHA is not monitoring or doing failover. #检查MHA是否工作, the switch requires MHA Manager to stop running [info] Checking replication Health on mysql-cent7-ip001110. #检查主从健康程度 [INFO] OK. [INFO] Searching new master from slaves. #开始在从库中选取一个新主 [INFO] candidate Masters from the configuration file: #列出候选从库 [info] mysql-cent7-ip001109 (192.168.1.1 09:3109) Version=5.7.19-log Log-bin:enabled[info] GTID on #检查GTID开启情况 [INFO] mysql-cent7-ip001110 (192.168.1.110:3110) version=5.7.19-log (oldest major Version between slaves) log-bin:enabled[ Info] GTID on #检查GTID开启情况 [info] replicating from 192.168.1.109 (192.168.1. 109:3109) [INFO] Primary candidate for the new Master (Candidate_master is set) [info] non-Candidate Masters:[info] searching from candidate_master slaves which has received the latest relay log events: [INFO] #在所有从库中选取relay log The latest one as the new Main library from:mysql-cent7-ip001109 (192.168.1.109:3109) (current master) +-- mysql-cent7-ip001110 (192.168.1.110:3110) to:mysql-cent7-ip001110 (192.168.1.110:3110) (new master) +-- MYSQL-CENT7-IP001109 (192.168.1.109:3109) starting master switch from mysql-cent7-ip001109 (192.168.1.109:3109) to mysql-cent7-ip001110 (192.168.1.110:3110)? (yes/no): #程序询问是否可以进行切换了
The master,3109 end is displayed as follows
#####################输出段2###########################USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES; -- 锁非事务表SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value; -- 加一个模拟锁,防止出现切换程序多开的问题,这样多开的程序以为获取不到同名锁就会失败退出,#GET_LOCK(str,time) -- str为锁的名称,0表示持续锁,5.7.5之后可以存在多个名称不同的GET_LOCKSHOW PROCESSLIST;
The original slave,3110 end is displayed as follows
#####################输出段2###########################USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;
The second part summarizes:
1.确认可以保存关闭非事务表后,关闭非事务表,加模拟锁防止切换程序多开造成切换异常 2.从候选从库中选出拥有最新数据的从库,并将其设为要切换到的主库
After the MHA manager-side output confirms that it can be switched
The MHA manager side output is as follows:
#####################输出段3###########################[info] Checking whether MySQL-Cent7-IP001110(192.168.1.110:3110) is ok for the new master..[info] ok. #检查上一步中被选定的新主库的是否真正可以成为主[info] MySQL-Cent7-IP001109(192.168.1.109:3109): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. #没有检查到原来主(现在被作为从)作为从库的残留信息,不管3721先将其挂到一个空主上[info] MySQL-Cent7-IP001109(192.168.1.109:3109): Resetting slave pointing to the dummy host.[info] ** Phase 1: Configuration Check Phase completed.[info] # 配置检查阶段结束[info] * Phase 2: Rejecting updates Phase..[info] #master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO):#切换程序检测到主机虚拟IP地址切换的地址没有定义,如果只切换主从身份的话,应用还会写到原来的主上,需要设置只读。询问是否真的要切换,这里我们只是做切换实验,先观察下两个实例的输出,然后直接切换即可。
The original master,3109 end is displayed as follows
USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES; #强制保存并关闭非事务型表,防止事务不一致SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value; SHOW PROCESSLIST;SHOW SLAVE STATUS;CHANGE MASTER TO MASTER_HOST=‘dummy_host‘; #将原主作为从切换到一个莫须有的主机上SHOW SLAVE STATUS;RESET SLAVE /*!50516 ALL */; #尝试清除自己之前残存的slave属性的信息,若本机不为主时,需要!SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Monitor‘) As Value;
The original slave,3110 end is displayed as follows
USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;SHOW PROCESSLIST;
The third part summarizes:
1.检查选定的从库的详细信息,确认真的是否可以作为新的主库 2.处理从库之间的数据比对,若只有一个主和从,则将原主库先挂到一个莫须有的空主上 若有两个以上的从库,则需清理从库原来的slave记录
MHA Manager-Side output confirmation, after forced switchover:
The MHA manager side output is as follows:
#################### #输出段4 ###########################[info] Locking all tables on the orig master to reject updates from EV Erybody (including root): [info] executing FLUSH TABLES with READ LOCK. [INFO] OK. #所有的表都禁止写操作 [INFO] Orig master Binlog:pos is 3109binlog.000070:536. [INFO] Waiting to execute all relay logs on mysql-cent7-ip001110 (192.168.1.110:3110): [INFO] master_pos_wait (3109binlog.000070:536) completed on mysql-cent7-ip001110 (192.168.1.110:3110). Executed 0 events. [INFO] done. #将从库已经获得, but the transaction has not yet been executed to itself, and the other has not been followed from the library [info] Getting new master ' s binlog name and position. [INFO] 3110binlog.000049:536[info] All and slaves should start replication from here. Statement should be:change MASTER to master_host= ' mysql-cent7-ip001110 or 192.168.1.110 ', master_port=3110, Master_auto _position=1, master_user= ' repl ', master_password= ' xxx '; [INFO] Setting read_only=0 on mysql-cent7-ip001110 (192.168.1.110:3110). [INFO] OK. #即将所有其他从库都用change Master's statement switches to the new master, which is the original from 110 [info] [INFO] * Switching slaves in parallel. [INFO] #即将并行切换所有从库 [INFO] Unlocking all tables on the orig master: [info] executing UNLOCK tables. [INFO] OK. #释放原主库的表锁 [INFO] starting orig Master as a new slave. [INFO] resetting slave mysql-cent7-ip001109 (192.168.1.109:3109) and starting replication from the new master mysql-cent7-ip001110 (192.168.1.110:3110): [INFO] Executed change MASTER. [INFO] Slave started. [INFO] All new slave servers switched successfully. [INFO] [INFO] * Phase 5:new master cleanup Phase. [INFO] #即将清理新从库的主从信息, switch to main library [info] mysql-cent7-ip001110:resetting slave info succeeded. [INFO] Switching master to mysql-cent7-ip001110 (192.168.1.110:3110) completed successfully. #切换完成
The new slave,3109 end appears as follows:
#####################输出段4###########################USE ``;SELECT CONNECTION_ID() AS Value;SET wait_timeout=86400;SET GLOBAL read_only=1; #将自己设为只读SHOW MASTER STATUS;UNLOCK TABLES; #释放表锁CHANGE MASTER TO MASTER_HOST=‘192.168.1.110‘, MASTER_PORT=3110, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘123456‘, MASTER_AUTO_POSITION=1; #把自己指向MHA选定的新主START SLAVE;SHOW SLAVE STATUS; #差异处,启动slave身份SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Failover‘) As Value; #释放锁止同应用的并发锁
The new master,3110 end appears as follows:
#################### #输出段4 ########################## #SHOW SLAVE STATUS; SELECT master_pos_wait (' 3109binlog.000070 ', ' 536 ', 0) as Result; #检查自己是否执行完差异事务STOP SLAVE Sql_thread; #停止从库SQL线程SHOW SLAVE STATUS; SHOW MASTER STATUS; SELECT @ @global. read_only as Value; SELECT @ @global. read_only as Value; SET GLOBAL read_only=0; #取消只读状态USE '; SELECT Unix_timestamp (); SELECT @ @GLOBAL. server_id; SET @master_heartbeat_period = 30000001024; SET @master_binlog_checksum = @ @global. binlog_checksum; SELECT @master_binlog_checksum; SELECT @ @GLOBAL. Gtid_mode; SELECT @@GLOBAL. Server_uuid; SET @slave_uuid = ' 28ea40ab-9bbd-11e7-8cd1-000c29c31069 '; #写入从库的UUIDSTOP SLAVE; #停止自身作为从库的身份SHOW SLAVE STATUS; RESET SLAVE/*!50516 all */; #清除自身作为从库的所有记录SHOW SLAVE STATUS; SELECT release_lock (' Mha_master_high_availability_failover ') as Value; #释放锁止同应用的并发锁
Part IV summarizes:
1.检查选定的新主实例relay日志是否存在未执行完的原主事务,并应用到新主 原主库:锁止,记录主从信息,释放锁,将自己指向新的主,启动从库身份 新主库: 自己执行完未来得及执行完的事务后停止自己从库身份,取消只读,清除自身作为从的记录 最后释放切换程序的并发锁
"MySQL" "High Availability" simple analysis of MHA switching logic from the Masterha_master_switch tool