MySQL master database High Availability-Dual-master single-active failover Solution

Last Update:2016-06-27 Source: Internet

Author: User

Tags egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL master database High Availability-Dual-master single-active failover Solution

Preface: (PS: Preface was added later when this article was modified.) Some bloggers raised some questions and opinions about this article and paid attention to it, it also prompted me to make this set of things more practical and safer. Later I thought about it and made some changes to some conditions and behaviors in the script. After several modifications, I finally dared to let my friends use this set of things.

Main purpose:

The dual-master architecture and keepalived are used to solve single point of failure (spof) of the master database in the MySQL master-slave structure. specific query statements are used to provide more fine-grained and more realistic judgment on the availability of the master database.

Basic Ideas:
Make DB1 and DB2 active-passive mode Dual-master structure: DB1 active and DB2 passive. Use the keepalived VIP to set the VIP address to the IP address of the original db1. this ensures that the transformation process is transparent to the code.
Three prerequisites:

"Log_slave_updates = 1" must be added to the configuration files of the two MySQL instances ";
The "slave" uses the "read_only" parameter to implement read-only features other than the root user;
Create the test. test table in the two databases and insert several pieces of data for the test script.
Normally, the VIP address is in DB1 and uses the keepalived call script to regularly check mysql service availability (a low-Permission user connects to the mysql server and executes a simple query to determine whether mysql is available based on the returned results)

If the query cannot be executed:

1. Check the service status after the first failed Detection ,:

If the service is abnormal, execute the switch: Disable keepalived of DB1 to make the VIP drift to DB2. Through the notify_master mechanism of keepalived on DB2, trigger the script to remove mysql from the passive state (read-only) switch to the active status (read/write) and send a notification email.
If the service is normal (the monitoring may fail due to some temporary factors), wait 30 s for the second check, which is the tolerable time for the check failure caused by transient/short-term factors, in line with the principle of "can be unrealistic. If the second check still fails

2. Start to execute a series of switching actions

Set MySQL of DB1 to read_only mode (to prevent writing, please continue to enter)
Kill the thread of the current client. I was worried that the kill thread would affect Data Execution. Later I checked the official document "mysql shutdown process" and found that one step of the normal mysql shutdown process was also like this, so here you can rest assured. Then sleep 2 and give the kill command some time (for more information about the kill command mechanism, refer to the official explanation)
Disable keepalived of DB1 so that DB2 takes over the VIP. Through the keepalived policy_master mechanism on DB2, the script is triggered
Mysql switches from passive (read-only) to active (read/write) and sends a notification email.

3. After the Administrator fixes DB1, the master database is switched back to db1. The script idea is as follows:

NOTE: If switching between the master and slave nodes is involved, there will be an interruption time. Therefore, we recommend that you perform this step during off-peak periods.

Set the read_only attribute of DB2 to 1.
Kill the client thread on DB2 and restart keepalived of DB2 to make VIP drift to DB1.
Make sure DB1 keeps up with DB2 updates and removes the read_only attribute on DB1.

About "data consistency" and "switching time ":

After two consecutive failures, you can set the read_only attribute for the primary MySQL and kill the user thread to ensure that no write operation is performed on DB1 before DB2 takes over the service, so as to avoid master-slave data inconsistency. The switching time is basically deterministic:

30 s (two detection intervals) + 2 s (wait for the kill command time) + about 1 s (keepalived switch VIP), the total time will not exceed 35 s.

The above is the general idea. You can see the following script for specific implementation.

Keepalived configuration on DB1

! Configuration File for keepalived

Vrrp_script chk_mysql {
Script "/etc/keepalived/check_mysql.sh"
Interval 30 # Here, my Check interval Settings are relatively long, because we have redis cache in front of our database, and the interruption of the database within one or two minutes has little impact on the overall availability. This is also the "reassurance" that I did not adopt a mature solution and made it myself"
}
Vrrp_instance VI_1 {
State BACKUP # the following priority is used to distinguish between the MASTER and BACKUP. Only in this way can the nopreempt under be effective.
Interface em2
Virtual_router_id 51
Priority100
Advert_int 1
Nopreempt # prevent automatic switch back to the master database after the master keepalived recovers
Authentication {
Auth_type PASS
Auth_pass 1111
}
Track_script {
Chk_mysql
}

Virtual_ipaddress {
192.168.1.5/24
}
}

The content of the/etc/keepalived/check_mysql.sh script is as follows (the main judgment logic is here)
#! /Bin/sh

### Determine if the last check script has not been executed, exit this execution
If ['ps-ef | grep-w "$0" | grep "/bin/sh *" | grep "? "| Grep "? "| Grep-v" grep "| wc-l '-gt 2]; then # theoretically it should be 1, but the experiment result is 2.
Exit 0
Fi

Alias mysql_con = 'mysql-uxxxx-pxxx'

### Define a function to determine whether mysql is available
Function excute_query {
Mysql_con-e "select * from test. test;" 2>/etc/keepalived/logs/check_mysql.err
}

### Define the processing function when the query cannot be executed and the mysql service is abnormal
Function service_error {
Echo-e "'date" + % F % H: % M: % S "'----- mysql service error, now stop keepalived -----">/etc/keepalived/logs/check_mysql.err
/Sbin/service keepalived stop &>/etc/keepalived/logs/check_mysql.err
Echo-e "\ n @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \ n ">/etc/keepalived/logs/check_mysql.err
}

### Define normal functions for mysql services that cannot perform queries
Function query_error {
Echo-e "'date" + % F % H: % M: % S "'----- query error, but mysql service OK, retry after 30 s ----- ">/etc/keepalived/logs/check_mysql.err
Sleep 30
Excute_query
If [$? -Ne 0]; then
Echo-e "'date" + % F % H: % M: % S "'----- still can't execute query -----">/etc/keepalived/logs/check_mysql.err

### Set the read_only attribute for DB1
Echo-e "'date" + % F % H: % M: % S "'----- set read_only = 1 on DB1 -----">/etc/keepalived/logs/check_mysql.err
Mysql_con-e "set global read_only = 1;" 2>/etc/keepalived/logs/check_mysql.err

### Kill the current client connection
Echo-e "'date" + % F % H: % M: % S "'----- kill current client thread -----">/etc/keepalived/logs/check_mysql.err
Rm-f/tmp/kill. SQL &>/dev/null
### Here is actually a tips for killing threads in batches
Mysql_con-e 'select concat ("kill", id, ";") from information_schema.PROCESSLIST where command = "Query" or command = "Execute" into outfile "/tmp/kill. SQL ";'
Mysql_con-e "source/tmp/kill. SQL"
Sleep 2 ### give kill an execution and buffer time
### Disable keepalived on the local machine
Echo-e "'date" + % F % H: % M: % S "'----- stop keepalived -----">/etc/keepalived/logs/check_mysql.err
/Sbin/service keepalived stop &>/etc/keepalived/logs/check_mysql.err
Echo-e "\ n @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \ n ">/etc/keepalived/logs/check_mysql.err
Else
Echo-e "'date" + % F % H: % M: % S "'----- query OK after 30 s -----">/etc/keepalived/logs/check_mysql.err
Echo-e "\ n @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \ n ">/etc/keepalived/logs/check_mysql.err
Fi
}

### Check Start: query execution
Excute_query
If [$? -Ne 0]; then
/Sbin/service mysql status &>/dev/null
If [$? -Ne 0]; then
Service_error
Else
Query_error
Fi
Fi

Keepalived configuration on DB2:
! Configuration File for keepalived

Vrrp_instance VI_1 {
State BACKUP
Interface em2
Virtual_router_id 51
Priority 90
Advert_int 1
Authentication {
Auth_type PASS
Auth_pass 1111
}
Notify_master/etc/keepalived/notify_master_mysql.sh # This Command tells keepalived to execute the script after it turns itself into a MASTER.
Virtual_ipaddress {
192.168.1.5/24
}
}

/Etc/keepalived/yy_master_mysql.sh script content:
#! /Bin/bash
### When keepalived detects that the local machine is switched to the MASTER state, execute this script

Change_log =/etc/keepalived/logs/state_change.log
Alias mysql_con = 'mysql-uroot-pxxxx-e "show slave status \ G;" 2>/dev/null'

Echo-e "'date" + % F % H: % M: % S "'----- keepalived change to MASTER -----" >>$ change_log

Slave_info (){
### Define a function to obtain the position, running, and log_file information of slave.
### Determine the data to be obtained based on the parameters following the Function
If [$1 = slave_status]; then
Slave_stat = 'mysql _ con | egrep-w "Slave_IO_Running | Slave_ SQL _Running "'
Slave_IO_Running = 'echo $ slave_stat | awk '{print $2 }''
Slave_ SQL _Running = 'echo $ slave_stat | awk '{print $4 }''
Elif [$1 = log_file-a $2 = pos]; then
Log_file_pos = 'mysql _ con | egrep-w "Master_Log_File | Read_Master_Log_Pos | Exec_Master_Log_Pos "'
Master_Log_File = 'echo $ log_file_pos | awk '{print $2 }''
Read_Master_Log_Pos = 'echo $ log_file_pos | awk '{print $4 }''
Exec_Master_Log_Pos = 'echo $ log_file_pos | awk '{print $6 }''
Fi
}

Action (){
### The action that should be performed during the switchover
Echo-e "'date" + % F % H: % M: % S "'----- set read_only = 0 on DB2 -----" >>$ change_log

### Remove the read_only attribute
Mysql_con-e "set global read_only = 0;" 2 >>$ change_log

Echo "DB2 keepalived is converted to the MASTER state, and the online database is switched to DB2" |/bin/mailx-s "DB2 keepalived change to MASTER "\
Lijiankai@dm.com 2> $ change_log

Echo-e "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ \ n ">>$ change_log
}

Slave_info slave_status
If [$ Slave_IO_Running = Yes-a $ Slave_ SQL _Running = Yes]; then
I = 0 # A counter
Slave_info log_file pos
### Determine whether all the binlogs received from the master are executed locally (it is still unable to completely determine whether the slave database has caught up with the master database, because io_thread cannot be completely guaranteed without delay (the probability of slave database lagging behind caused by network transmission problems is very small)
Until [$ Read_Master_Log_Pos = $ Exec_Master_Log_Pos]
Do
If [$ I-lt 10]; then # The time limit for waiting for exec_pos to catch up with read_pos is 10 s.
Echo-e "'date" + % F % H: % M: % S "'----- Master_Log_File = $ Master_Log_File. Exec_Master_Log_Pos ($ Exec_Master_Log_Pos) is behind Read_Master_Lo
G_Pos ($ Read_Master_Log_Pos), wait... "> $ change_log # output the message to the log, waiting for exec_pos = read_pos
I = $ ($ I + 1 ))
Sleep 1
Slave_info log_file pos
Else
Echo-e "The waits time is more than 10 s, now force change. Master_Log_File = $ Master_Log_File Read_Master_Log_Pos = $ Read_Master_Log_Pos Exec_Ma
Ster_Log_Pos = $ Exec_Master_Log_Pos "> $ change_log
Action
Exit 0
Fi
Done
Action

Else
Slave_info log_file pos
Echo-e "DB2's slave status is wrong, now force change. Master_Log_File = $ Master_Log_File Read_Master_Log_Pos = $ Read_Master_Log_Pos Exec_Master_Log_Po
S = $ Exec_Master_Log_Pos "> $ change_log
Action
Fi

The script change_to_backup.sh for manually switching back to DB1 on DB2:
#! /Bin/sh
### Manually switch the master database back to DB1

Alias mysql_con = 'mysql-uxxxx-pxxx'

Echo-e "'date" + % F % H: % M: % S "'----- change to BACKUP manually -----">/etc/keepalived/logs/state_change.log
Echo-e "'date" + % F % H: % M: % S "'----- set read_only = 1 on DB2 -----">/etc/keepalived/logs/state_change.log
Mysql_con-e "set global read_only = 1;" 2>/etc/keepalived/logs/state_change.log

### Kill the current client connection
Echo-e "'date" + % F % H: % M: % S "'----- kill current client thread -----">/etc/keepalived/logs/state_change.log
Rm-f/tmp/kill. SQL &>/dev/null
### Here is actually a tips for killing threads in batches
Mysql_con-e 'select concat ("kill", id, ";") from information_schema.PROCESSLIST where command = "Query" or command = "Execute" into outfile "/tmp/kill. SQL ";'
Mysql_con-e "source/tmp/kill. SQL" 2>/etc/keepalived/logs/state_change.log
Sleep 2 ### give kill an execution and buffer time

### Make sure DB1 has caught up. The repl below is the account used for replication, and-h is followed by the Intranet IP address of DB1
Pos = 'mysql-urepl-pxxxx-h192.168.1.x-e "show slave status \ G;" | grep "Master_Log_Pos" | awk '{printf ("% s ", $ NF "\ t ")}''
Read_pos = 'echo $ pos | awk '{print $1 }''
Exec_pos = 'echo $ pos | awk '{print $2 }''
Until [$ read_pos = $ exec_pos]
Do
Echo-e "'date" + % F % H: % M: % S "'----- DB1 Exec_Master_Log_Pos ($ exec_pos) is behind Read_Master_Log_Pos ($ read_pos ), wait ...... ">/etc/keepalived/logs/state_change.log
Sleep 1
Done

### Remove the read_only attribute of DB1
Echo-e "'date" + % F % H: % M: % S "'----- set read_only = 0 on DB1 -----">/etc/keepalived/logs/state_change.log
Ssh 192.168.1.x 'mysql-uxxxx-pxxxx-e "set global read_only = 0;" &/etc/init. d/keepalived start '2>/etc/keepalived/logs/state_change.log

### Restart keepalived of DB2 to make VIP drift to DB1
Echo-e "'date" + % F % H: % M: % S "'----- make VIP move to DB1 -----">/etc/keepalived/logs/state_change.log
/Sbin/service keepalived restart &>/etc/keepalived/logs/state_change.log

Echo "DB2 keepalived to BACKUP state, online database switch to DB1" |/bin/mailx-s "DB2 keepalived change to BACKUP" xxx@xxxx.com 2>/etc/keepalived/logs/state_change.log

Echo-e "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ \ n ">/etc/keepalived/logs/state_change.log

Logs:
DB1 mysql service fault:

DB1 mysql service is normal and query failed:

One DB2 switchover process:

DB2 executes the script to manually switch back to DB1:

Summary: Compared with MHA or MMM technologies, MMM features simplicity and reduces the complexity of implementation and maintenance. It also securely solves the single point of failure (spof) of master nodes in the master and slave nodes. On this basis, the architecture such as read/write splitting can also be added from the slave database. The disadvantage is that the dual master is still single-active, and DB2 is only used as hot backup.

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More