MySQL High Availability-Dual-master failover Solution

Source: Internet
Author: User
Preface: (PS: Preface is added when this article is modified for the second time.) Some bloggers have raised some questions and opinions about this article and have followed those of bloggers, it also prompted me to make this set of things more practical and safer. Later I thought about it and made some changes to some conditions and behaviors in the script. After modification, I finally dared to let my friends use this set of things.

(PS: Preface is added when this article is being modified for the second time) some bloggers have raised some questions and opinions about this article and have followed those of bloggers, it also prompted me to make this set of things more practical and safer. Later I thought about it and made some changes to some conditions and behaviors in the script. After modification, I finally dared to let my friends use this set of things.

Preface:(PS: Preface was added when this article was modified for the second time.) Some bloggers raised some questions and opinions about this article and paid attention to it, it also prompted me to make this set of things more practical and safer. Later I thought about it and made some changes to some conditions and behaviors in the script. After modification, I finally dared to let my friends use this set of things.

Main purpose:

The dual-master architecture and keepalived are used to solve single point of failure (spof) of the master database in the MySQL master-slave structure. specific query statements are used to provide more fine-grained and more realistic judgment on the availability of the master database.

Basic Ideas:

Make DB1 and DB2 active-passive mode Dual-master structure: DB1 active and DB2 passive. Use the keepalived VIP to set the VIP address to the IP address of the original db1. this ensures that the transformation process is transparent to the code.

Three prerequisites:

Add"Log_slave_updates = 1";

And the "backup machine" passes the"Read_only"Parameter to achieve read-only features other than the root user;

Create the test. test table in the two databases and insert several pieces of data for the test script.

Normally, the VIP address is in DB1 and uses the keepalived call script to regularly check mysql service availability (a low-Permission user connects to the mysql server and executes a simple query to determine whether mysql is available based on the returned results)

If the query cannot be executed:

1. Check the service status after the first failed Detection ,:

  1. If the service is abnormal, execute the switch: Disable keepalived of DB1 to make the VIP drift to DB2. Through the notify_master mechanism of keepalived on DB2, trigger the script to remove mysql from the passive state (read-only) switch to the active status (read/write) and send a notification email.

  2. If the service is normal (the monitoring may fail due to some temporary factors), wait 30 s for the second check, which is the tolerable time for the check failure caused by transient/short-term factors, in line with the principle of "can be unrealistic. If the second check still fails

2. Start to execute a series of switching actions

  1. Set MySQL of DB1 to read_only mode (to prevent writing, please continue to enter)

  2. Kill the thread of the current client. I was worried that the kill thread would affect Data Execution. Later I checked the official document "mysql shutdown process" and found that one step of the normal mysql shutdown process was also like this, so here you can rest assured. Then sleep 2 and give the kill command some time (for more information about the kill command mechanism, refer to the official explanation)

  3. Disable keepalived of DB1 so that DB2 takes over the VIP. Through the keepalived policy_master mechanism on DB2, the script is triggered

    Mysql switches from passive (read-only) to active (read/write) and sends a notification email.

3. After the Administrator fixes DB1, the master database is switched back to db1. The script idea is as follows:

NOTE: If switching between the master and slave nodes is involved, there will be an interruption time. Therefore, we recommend that you perform this step during off-peak periods.

  1. Set the read_only attribute of DB2 to 1.

  2. Kill the client thread on DB2 and restart keepalived of DB2 to make VIP drift to DB1.

  3. Make sure DB1 keeps up with DB2 updates and removes the read_only attribute on DB1.


About "data consistency" and "switching time ":

After two consecutive failures, you can set the read_only attribute for the primary MySQL and kill the user thread to ensure that no write operation is performed on DB1 before DB2 takes over the service, so as to avoid master-slave data inconsistency. The switching time is basically deterministic:

30 s (two detection intervals) + 2 s (wait for the kill command time) + about 1 s (keepalived switch VIP), the total time will not exceed 35 s.


The above is the general idea. You can see the following script for specific implementation.

Keepalived configuration on DB1

! Configuration File for keepalivedvrrp_script chk_mysql {script "/etc/keepalived/check_mysql.sh" interval 30 # Here, my Check interval Settings are long because redis is used for caching in front of our database, A database of one or two-minute terminals has little impact on the overall availability. This is also the "reassurance"} vrrp_instance VI_1 {state BACKUP # used the following priority to distinguish between MASTER and BACKUP, interface em2 virtual_router_id 51 priority 100 advert_int 1 nopreempt # After the slave database is switched, after the master keepalived is restored, the system automatically switches back to the master database for authentication {auth_type PASS auth_pass 1111} track_script {chk_mysql} virtual_ipaddress {192.168.1.5/24 }}


The content of the/etc/keepalived/check_mysql.sh script is as follows (the main judgment logic is here)

#! /Bin/sh ### determine if the last check script has not been executed, if ['ps-ef | grep-w "$0" | grep "/bin/sh *" | grep "? "| Grep "? "| Grep-v" grep "| wc-l '-gt 2]; then # theoretically it should be 1, however, the experiment result is 2 exit 0 fialias mysql_con = 'mysql-uxxxx-pxxx' ### define a function excute_query {mysql_con-e "select * from test. test; "2>/etc/keepalived/logs/check_mysql.err }## define that the query cannot be executed, mysql service Exception Processing function service_error {echo-e "'date" + % F % H: % M: % S "'----- mysql service error, now stop keepalived ----- ">/etc/keepalived/logs/check_mysql. Err/sbin/service keepalived stop &>/etc/keepalived/logs/check_mysql.err echo-e "\ n @@@@@@@@@@@@@@@@ \ n "> >/etc/keepalived/logs/check_mysql.err }### definition: Query cannot be executed, however, the function query_error {echo-e "'date" + % F % H: % M: % S "'----- query error, but mysql service OK, retry after 30 s ----- ">>>/etc/keepalived/logs/check_mysql.err sleep 30 excute_query if [$? -Ne 0]; then echo-e "'date" + % F % H: % M: % S "'----- still can't execute query -----">/etc/keepalived/logs/check_mysql.err ### set the read_only attribute echo-e "'date" + % for DB1 F % H: % M: % S "'----- set read_only = 1 on DB1 -----">/etc/keepalived/logs/check_mysql.err mysql_con-e "set global read_only = 1; "2>/etc/keepalived/logs/check_mysql.err ### kill the current client connection echo-e" 'date "+ % F % H: % M: % S "'----- kill current clie Nt thread ----- ">/etc/keepalived/logs/check_mysql.err rm-f/tmp/kill. SQL &>/dev/null ### here is actually a tips for batch kill threads: mysql_con-e 'select concat ("kill", id ,";") from information_schema.PROCESSLIST where command = "Query" or command = "Execute" into outfile "/tmp/kill. SQL "; 'mysql_con-e" source/tmp/kill. SQL "sleep 2 ### give kill an execution and buffer time ### disable keepalived echo-e" 'date "+ % F % H: % M: % S "'----- stop keepalived ---- -">/Etc/keepalived/logs/check_mysql.err/sbin/service keepalived stop &>/etc/keepalived/logs/check_mysql.err echo-e" \ n @@@@@ @ \ n ">/etc/keepalived/logs/check_mysql.err else echo-e" 'date "+ % F % H: % M: % S "'----- query OK after 30 s -----">/etc/keepalived/logs/check_mysql.err echo-e "\ n @@@@@@@@@@@@ \ n ">/et C/keepalived/logs/check_mysql.err fi }### Check Start: Execute the query excute_queryif [$? -Ne 0]; then/sbin/service mysql status &>/dev/null if [$? -Ne 0]; then service_error else query_error fifi


Keepalived configuration on DB2:

! Configuration File for primary VI_1 {state BACKUP interface em2 virtual_router_id 51 priority 90 advert_int 1 authentication {auth_type PASS auth_pass 1111} notify_master/etc/keepalived/Digest # This Command tells keepalived to find itself converted to MASTER executed script virtual_ipaddress {192.168.1.5/24 }}

/Etc/keepalived/yy_master_mysql.sh script content:

#! /Bin/bash ### when keepalived detects that the local machine is switched to the MASTER state, run the script alias mysql_con = 'mysql-uxxxx-pxxx' echo-e "'date" + % F % H: % M: % S "'----- keepalived change to MASTER -----">/etc/keepalived/logs/state_change.log ### determine whether all binlogs received from the master are executed locally (it still cannot be completely determined that the slave database has caught up with the master database, although the slave database latency is generally slow on SQL _thread, it cannot completely guarantee that io_thread has no latency at all. But at least ensure that the read binlog has been executed locally.) pos = 'mysql _ con-e "show slave status \ G; "| grep" Master_Log_Pos "| awk '{printf (" % s ", $ NF" \ t ")} ''read _ pos = 'echo $ pos | awk '{print $1} ''exec _ pos = 'echo $ pos | awk' {print $2}'' [$ read_pos = $ exec_pos] do echo-e "'date" + % F % H: % M: % S "'----- Exec_Master_Log_Pos is behind Read_Master_Log_Pos, wait ...... ">/etc/keepalived/logs/state_change.log sleep 1 done ### then remove the read_only attribute echo-e" 'date "+ % F % H: % M: % S "'----- set read_only = 0 on DB2 -----">/etc/keepalived/logs/state_change.logmysql_con-e "set global read_only = 0; "2>/etc/keepalived/logs/state_change.logecho" DB2 keepalived changed to MASTER, online DB server changed to DB2 "|/bin/mailx-s" DB2 keepalived change to MASTER "xxxx@xxxx.com 2>/etc/keepalived/logs/state_change.logecho-e "@@@@ \ n ">/etc/keepalived/logs/state_change.log

The script change_to_backup.sh for manually switching back to DB1 on DB2:

#! /Bin/sh ### manually switch the master database back to DB1. alias mysql_con = 'mysql-uxxxx-pxxx' echo-e "'date" + % F % H: % M: % S "'----- change to BACKUP manually -----">/etc/keepalived/logs/state_change.logecho-e "'date" + % F % H: % M: % S "'----- set read_only = 1 on DB2 -----">/etc/keepalived/logs/state_change.logmysql_con-e "set global read_only = 1; "2>/etc/keepalived/logs/state_change.log ### kill the current client connection echo-e" 'date "+ % F % H: % M: % S "'----- kill current client thread -----">/etc/keepalived/logs/state_change.logrm-f/tmp/kill. SQL &>/dev/null ### here is actually a tips for batch kill threads: mysql_con-e 'select concat ("kill", id ,";") from information_schema.PROCESSLIST where command = "Query" or command = "Execute" into outfile "/tmp/kill. SQL "; 'mysql _ con-e" source/tmp/kill. SQL "2>/etc/keepalived/logs/state_change.logsleep 2 ### give kill an execution and buffer time ### restart the keepalived of DB2 to make the VIP drift to DB1echo-e "' date "+ % F % H: % M: % S "'----- make VIP move to DB1 -----">/etc/keepalived/logs/state_change.log/sbin/service keepalived restart &>/etc/keepalived/logs/state_change.log # # Make Sure That DB1 has caught up, the following repl is the account used for replication.-h is followed by DB1's intranet IPpos = 'mysql-urepl-pxxxx-h192.168.1.x-e "show slave status \ G; "| grep" Master_Log_Pos "| awk '{printf (" % s ", $ NF" \ t ")} ''read _ pos = 'echo $ pos | awk '{print $1} ''exec _ pos = 'echo $ pos | awk' {print $2}'' [$ read_pos = $ exec_pos] do echo-e "'date" + % F % H: % M: % S "'----- DB1 Exec_Master_Log_Pos ($ exec_pos) is behind Read_Master_Log_Pos ($ read_pos), wait ...... ">/etc/keepalived/logs/state_change.log sleep 1 done ### then remove the read_only attribute echo-e" 'date "+ % F % H: % M: % S "'----- set read_only = 0 on DB1 -----">/etc/keepalived/logs/state_change.logssh-pxxxx 192.168.1.x 'mysql-uxxxx-pxxxx-e "set global read_only = 0; "'2>/etc/keepalived/logs/state_change.logecho" DB2 keepalived is changed to the BACKUP status, online Database switch to DB1 "|/bin/mailx-s" DB2 keepalived change to BACKUP "xxx@xxxx.com 2>/etc/keepalived/logs/state_change.logecho-e "@@@@@ @ \ n ">/etc/keepalived/logs/state_change.log


Logs:

DB1 mysql Service Fault

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12103364a-0.jpg "title =" qq 510151741.png "alt =" wKiom1Yfb07x-EW5AAD6D94EuSs047.jpg "/>

DB1 mysql service is normal and query failed

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/1210332V6-1.jpg "title =" ..png "alt =" wKiom1YYad6QI607AAFDXJ38Zpk793.jpg "/>

DB2 one-time switchover process

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12103343B-2.jpg "title =" db2w.png "alt =" wKioL1YYaOXjTQEDAAB9x-ob4rE048.jpg "/>

DB2 executes the script to manually switch back to DB1:

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12103320c-3.jpg "title =" manual switch back db1.png "alt =" wKiom1YYa97Rq_5_AAFRVWUeFVI819.jpg "/>


Summary:This solution is suitable for small and medium-sized enterprises and solves the single point of failure (spof) of master nodes in the master and slave nodes. At the same time, the slave database architecture can be added to implement read/write splitting.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.