標籤:including 詳細 沒有 binlog long second getting write 之間
簡介:masterha_master_switch作為一個切換工具被整合在MHA程式包中,
安裝:編譯安裝MHA manager後會在/usr/local/bin/中產生二進位可執行程式masterha_master_switch。
使用:
$masterha_master_switch --helpUsage: # For master failover masterha_master_switch --master_state=dead --global_conf=/etc/masterha_default.cnf --conf=/usr/local/masterha/conf/app1.cnf --dead_master_host=host1 # For online master switch masterha_master_switch --master_state=alive --global_conf=/etc/masterha_default.cnf --conf=/usr/local/masterha/conf/app1.cnf See online reference (http://code.google.com/p/mysql-master-ha/wiki/masterha_master_switch) for details.
在這裡,我習慣將一套主從的配置都放在app1.cnf中,並且更改為業務相關的名稱,如mainBusiness.cnf
分析:
目標:擷取masterha_master_switch的線上切換邏輯環境:MHA manager 192.168.1.8 MHA node1+MySQL5.7+GTID 192.168.1.109+PORT3109 主 MHA node1+MySQL5.7+GTID 192.168.1.110+PORT3110 從
設定檔內容:
manager_workdir=/data/mha/mainBusiness #設定MHA的工作目錄manager_log=/data/mha/mainBusiness/manager.log #MHA manager的日誌輸出remote_workdir=/data/mha/ #預設MHA node端的工作目錄master_binlog_dir= /data/mysql/3109/log/,/data/mysql/3110/log/ #預設MHA node端的binlog目錄 #secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110 secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110 --user=root --master_host=192.168.1.109 --master_port=3109 ping_interval=1 #設定MHA manager的活動訊號間隔時間(1秒)[server1]hostname=MySQL-Cent7-IP001109ip=192.168.1.109port=3109ssh_user=rootssh_port=22candidate_master=1 #設定該節點是否可以提升為主,1為是,0否check_repl_delay=0 #發生故障後是否檢查本執行個體主從落後程度,0否,1是[server2]hostname=MySQL-Cent7-IP001110ip=192.168.1.110port=3110ssh_user=rootssh_port=22candidate_master=1 #設定該節點是否可以提升為主,1為是,0否check_repl_delay=0 #發生故障後是否檢查本執行個體主從落後程度,0否,1是 在MHA manager端上執行:$masterha_master_switch --master_state=alive --conf=/etc/mha/mainBusiness.cnf --orig_master_is_new_slave#--master_state 指明線上切換#--orig_master_is_new_slave 指定原先的主作為從庫掛到新的主上
MHA manager端輸出如下
#####################輸出段1###########################[info] MHA::MasterRotate version 0.57.[info] Starting online master switch.. #開始線上切換[info] [info] * Phase 1: Configuration Check Phase.. #階段1,檢查配置[info] [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.. #我這裡沒有使用全域參數檔案,會有報錯跳過,沒關係[info] Reading application default configuration from /etc/mha/mainBusiness.cnf.. #程式從mainBusiness檔案中讀取配置 [info] Reading server configuration from /etc/mha/mainBusiness.cnf..[info] GTID failover mode = 1 #啟用GTID容錯移轉模式[info] Current Alive Master: MySQL-Cent7-IP001109(192.168.1.109:3109) #列出當前存活的主執行個體[info] Alive Slaves: #列出當前存活的從執行個體[info] MySQL-Cent7-IP001110(192.168.1.110:3110) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled[info] GTID ON #從採用了GTID模式[info] Replicating from 192.168.1.109(192.168.1.109:3109)[info] Primary candidate for the new Master (candidate_master is set)It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on MySQL-Cent7-IP001109(192.168.1.109:3109)? (YES/no): #處於事務一致性考慮,程式詢問是否臨時關閉非事務型的表
原master,3109端顯示如下
#####################輸出段1###########################SET wait_timeout=86400; #設定連線逾時時間,防止切換時翻車SELECT @@global.server_id As Value; SELECT VERSION() AS Value; #擷取用於複製的server—idSELECT @@global.gtid_mode As Value; #擷取自身是否使用了GTID複製SHOW GLOBAL VARIABLES LIKE ‘log_bin‘; #檢查自身是否啟用了binlogSHOW MASTER STATUS; #擷取自身的事務執行情況SELECT @@global.datadir AS Value; #擷取自身資料檔案的儲存位置SELECT @@global.slave_parallel_workers AS Value; #檢查是否採用了多線程複製SHOW SLAVE STATUS; #擷取自身作為從庫時的事務執行情況SELECT @@global.read_only As Value; #擷取自身是否開啟了唯讀SELECT @@global.relay_log_purge As Value; #檢查自身是否開啟了relay log自動清除
原slave,3110端顯示如下
SELECT @@global.server_id As Value;SELECT VERSION() AS Value;SELECT @@global.gtid_mode As Value;SHOW GLOBAL VARIABLES LIKE ‘log_bin‘;SHOW MASTER STATUS;SELECT @@global.datadir AS Value;SELECT @@global.slave_parallel_workers AS Value;SHOW SLAVE STATUS;SELECT @@global.read_only As Value;SELECT @@global.relay_log_purge As Value;SELECT @@global.relay_log_info_repository AS Value; #差異處,擷取自身relay資訊儲存形式(table)SELECT Relay_log_name FROM mysql.slave_relay_log_info; #差異處,擷取正在使用的relay檔案名稱SELECT @@global.datadir AS Value; SHOW SLAVE STATUS;SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = ‘repl‘; #差異處,檢查複製使用者是否具有複製許可權
第一部分小結:
讀取設定檔,確認主從關係與複製方式; 根據主從關係複製方式,串連主庫:設定必要參數,擷取的複製詳細資料/ 串連從庫:擷取複製的詳細資料,擷取relay資訊,擷取repl帳號並確認許可權
接MHA manager端輸出確認輸入yes後
MHA manager端輸出如下:
#####################輸出段2###########################[info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..[info] ok. #注意,這裡雖然顯示關閉了非事務表,但是上面的抓包語句裡面並沒有出現相關語句[info] Checking MHA is not monitoring or doing failover.. #檢查MHA是否工作,切換時要求MHA manager停止運行[info] Checking replication health on MySQL-Cent7-IP001110.. #檢查主從健康程度[info] ok.[info] Searching new master from slaves.. #開始在從庫中選取一個新主[info] Candidate masters from the configuration file: #列出候選從庫[info] MySQL-Cent7-IP001109(192.168.1.109:3109) Version=5.7.19-log log-bin:enabled[info] GTID ON #檢查GTID開啟情況[info] MySQL-Cent7-IP001110(192.168.1.110:3110) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled[info] GTID ON #檢查GTID開啟情況[info] Replicating from 192.168.1.109(192.168.1.109:3109)[info] Primary candidate for the new Master (candidate_master is set)[info] Non-candidate masters:[info] Searching from candidate_master slaves which have received the latest relay log events..[info] #在所有從庫中選取relay log最新的一個作為新的主庫From:MySQL-Cent7-IP001109(192.168.1.109:3109) (current master) +--MySQL-Cent7-IP001110(192.168.1.110:3110)To:MySQL-Cent7-IP001110(192.168.1.110:3110) (new master) +--MySQL-Cent7-IP001109(192.168.1.109:3109)Starting master switch from MySQL-Cent7-IP001109(192.168.1.109:3109) to MySQL-Cent7-IP001110(192.168.1.110:3110)? (yes/NO): #程式詢問是否可以進行切換了
master,3109端顯示如下
#####################輸出段2###########################USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES; -- 鎖非事務表SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value; -- 加一個類比鎖,防止出現切換程式多開的問題,這樣多開的程式以為擷取不到同名鎖就會失敗退出,#GET_LOCK(str,time) -- str為鎖的名稱,0表示持續鎖,5.7.5之後可以存在多個名稱不同的GET_LOCKSHOW PROCESSLIST;
原slave,3110端顯示如下
#####################輸出段2###########################USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;
第二部分小結:
1.確認可以儲存關閉非事務表後,關閉非事務表,加類比鎖防止切換程式多開造成切換異常 2.從候選從庫中選出擁有最新資料的從庫,並將其設為要切換到的主庫
接MHA manager端輸出確認可以切換後
MHA manager端輸出如下:
#####################輸出段3###########################[info] Checking whether MySQL-Cent7-IP001110(192.168.1.110:3110) is ok for the new master..[info] ok. #檢查上一步中被選定的新主庫的是否真正可以成為主[info] MySQL-Cent7-IP001109(192.168.1.109:3109): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. #沒有檢查到原來主(現在被作為從)作為從庫的殘留資訊,不管3721先將其掛到一個空主上[info] MySQL-Cent7-IP001109(192.168.1.109:3109): Resetting slave pointing to the dummy host.[info] ** Phase 1: Configuration Check Phase completed.[info] # 配置檢查階段結束[info] * Phase 2: Rejecting updates Phase..[info] #master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO):#切換程式檢測到主機虛擬IP地址切換的地址沒有定義,如果只切換主從身份的話,應用還會寫到原來的主上,需要設定唯讀。詢問是否真的要切換,這裡我們只是做切換實驗,先觀察下兩個執行個體的輸出,然後直接切換即可。
原master,3109端顯示如下
USE `unknown_database`;FLUSH NO_WRITE_TO_BINLOG TABLES; #強制儲存並關閉非事務型表,防止事務不一致SELECT GET_LOCK(‘MHA_Master_High_Availability_Monitor‘, ‘0‘) AS Value; SHOW PROCESSLIST;SHOW SLAVE STATUS;CHANGE MASTER TO MASTER_HOST=‘dummy_host‘; #將原主作為從切換到一個莫須有的主機上SHOW SLAVE STATUS;RESET SLAVE /*!50516 ALL */; #嘗試清除自己之前殘存的slave屬性的資訊,若本機不為主時,需要!SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Monitor‘) As Value;
原slave,3110端顯示如下
USE `unknown_database`;SELECT GET_LOCK(‘MHA_Master_High_Availability_Failover‘, ‘0‘) AS Value;SHOW SLAVE STATUS;SHOW SLAVE STATUS;SHOW PROCESSLIST;
第三部分總結:
1.檢查選定的從庫的詳細資料,確認真的是否可以作為新的主庫 2.處理從庫之間的資料比對,若只有一個主和從,則將原主庫先掛到一個莫須有的空主上 若有兩個以上的從庫,則需清理從庫原來的slave記錄
接MHA manager端輸出確認,強制切換後:
MHA manager端輸出如下:
#####################輸出段4###########################[info] Locking all tables on the orig master to reject updates from everybody (including root):[info] Executing FLUSH TABLES WITH READ LOCK..[info] ok. #所有的表都禁止寫操作[info] Orig master binlog:pos is 3109binlog.000070:536.[info] Waiting to execute all relay logs on MySQL-Cent7-IP001110(192.168.1.110:3110)..[info] master_pos_wait(3109binlog.000070:536) completed on MySQL-Cent7-IP001110(192.168.1.110:3110). Executed 0 events.[info] done.#將從庫已經獲得,但還未來得及執行的事務應用到自身,和其他沒有跟上的從庫[info] Getting new master`s binlog name and position..[info] 3110binlog.000049:536[info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘MySQL-Cent7-IP001110 or 192.168.1.110‘, MASTER_PORT=3110, MASTER_AUTO_POSITION=1, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘;[info] Setting read_only=0 on MySQL-Cent7-IP001110(192.168.1.110:3110)..[info] ok. #即將所有其他從庫都用change master的語句切換到新的主,即原來的從110上[info] [info] * Switching slaves in parallel..[info] #即將並行切換所有從庫[info] Unlocking all tables on the orig master: [info] Executing UNLOCK TABLES..[info] ok. #釋放原主庫的表鎖[info] Starting orig master as a new slave..[info] Resetting slave MySQL-Cent7-IP001109(192.168.1.109:3109) and starting replication from the new master MySQL-Cent7-IP001110(192.168.1.110:3110)..[info] Executed CHANGE MASTER.[info] Slave started.[info] All new slave servers switched successfully.[info] [info] * Phase 5: New master cleanup phase..[info] #即將清理新從庫的主從資訊,切換到主庫[info] MySQL-Cent7-IP001110: Resetting slave info succeeded.[info] Switching master to MySQL-Cent7-IP001110(192.168.1.110:3110) completed successfully. #切換完成
新slave,3109端顯示如下:
#####################輸出段4###########################USE ``;SELECT CONNECTION_ID() AS Value;SET wait_timeout=86400;SET GLOBAL read_only=1; #將自己設為唯讀SHOW MASTER STATUS;UNLOCK TABLES; #釋放表鎖CHANGE MASTER TO MASTER_HOST=‘192.168.1.110‘, MASTER_PORT=3110, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘123456‘, MASTER_AUTO_POSITION=1; #把自己指向MHA選定的新主START SLAVE;SHOW SLAVE STATUS; #差異處,啟動slave身份SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Failover‘) As Value; #釋放鎖止同應用的並發鎖
新master,3110端顯示如下:
#####################輸出段4###########################SHOW SLAVE STATUS;SELECT MASTER_POS_WAIT(‘3109binlog.000070‘,‘536‘,0) AS Result; #檢查自己是否執行完差異事務STOP SLAVE SQL_THREAD; #停止從庫SQL線程SHOW SLAVE STATUS; SHOW MASTER STATUS; SELECT @@global.read_only As Value; SELECT @@global.read_only As Value; SET GLOBAL read_only=0; #取消唯讀狀態USE ``;SELECT UNIX_TIMESTAMP(); SELECT @@GLOBAL.SERVER_ID; SET @master_heartbeat_period= 30000001024; SET @master_binlog_checksum= @@global.binlog_checksum; SELECT @master_binlog_checksum; SELECT @@GLOBAL.GTID_MODE; SELECT @@GLOBAL.SERVER_UUID; SET @slave_uuid= ‘28ea40ab-9bbd-11e7-8cd1-000c29c31069‘; #寫入從庫的UUIDSTOP SLAVE; #停止自身作為從庫的身份SHOW SLAVE STATUS; RESET SLAVE /*!50516 ALL */; #清除自身作為從庫的所有記錄SHOW SLAVE STATUS;SELECT RELEASE_LOCK(‘MHA_Master_High_Availability_Failover‘) As Value; #釋放鎖止同應用的並發鎖
第四部分總結:
1.檢查選定的新主執行個體relay日誌是否存在未執行完的原主事務,並應用到新主 原主庫:鎖止,記錄主從資訊,釋放鎖,將自己指向新的主,啟動從庫身份 新主庫: 自己執行完未來得及執行完的事務後停止自己從庫身份,取消唯讀,清除自身作為從的記錄 最後釋放切換程式的並發鎖
【MySQL】【高可用】從masterha_master_switch工具簡單分析MHA的切換邏輯