High-availability clusters based on keepalived to build MySQL

Source: Internet
Author: User
Tags db2 switches syslog egrep

MySQL's high-availability scenarios typically include the following:

Keepalived+ Dual master, Mha,mmm,heartbeat+drbd,pxc,galera Cluster

The more commonly used are keepalived+, MHA and PXC.

For small companies, it is generally recommended to use keepalived+ dual master, simple.

Let's deploy it here.

Configure the Environment:

Role host IP host name operating system version software version

VIP 192.168.244.10

Master1 192.168.244.145 master1 CentOS7.1 MySQL 5.6.26,keepalived v1.2. 13

Master2 192.168.244.146 master2 CentOS7.1 MySQL 5.6.26,keepalived v1.2. 13

First, configure the MySQL dual-master replication environment

1. Modify the configuration file

The configuration for replication in Master1 is as follows:

[Mysqld]log-bin=mysql-binserver-id=1log_slave_updates=1

Master2

[Mysqld]log-bin=mysql-binserver-id=2log_slave_updates=1read_only=1

2. Create a replication user

Created in Master1:

CREATE USER ' repl ' @ ' 192.168.244.146 ' identified by ' MySQL '; GRANT REPLICATION SLAVE on * * to ' repl ' @ ' 192.168.244.146 ';

Created in Master2:

CREATE USER ' repl ' @ ' 192.168.244.145 ' identified by ' MySQL '; GRANT REPLICATION SLAVE on * * to ' repl ' @ ' 192.168.244.145 ';

3. Execute change MASTER to statement

Because the MySQL master-slave replication cluster is built from scratch, it is not necessary to obtain a global read lock to get the location of the binary log file, which is confirmed directly according to the output of Show master status.

Execute on Master1:

Change MASTER to  master_host= ' 192.168.244.146 ',  master_user= ' Repl ',  master_password= ' MySQL '  , Master_log_file= ' mysql-bin.000004 ',  master_log_pos=64729;

Execute on Master2:

Change MASTER to  master_host= ' 192.168.244.145 ',  master_user= ' Repl ',  master_password= ' MySQL '  , Master_log_file= ' mysql-bin.000003 ',  master_log_pos=68479;

4. Execute the start slave statement on two nodes and see if the replication is successful through show slave status\g.

Success criteria:

Slave_IO_Running:YesSlave_SQL_Running:Yes

Second, the configuration keepalived

1. Installing keepalived

# yum Install-y keepalived

Of course, you can also directly compile the official source package.

2. Modify the keepalived configuration file

Master1

[Email protected] ~]# vim/etc/keepalived/keepalived.conf

Vrrp_script Chk_mysql {    script "/etc/keepalived/check_mysql.sh"    interval         #设置检查间隔时长, can be set according to their own needs} Vrrp_instance vi_1 {State    BACKUP        #通过下面的priority来区分MASTER和BACKUP, and only so, the bottom nopreempt is effective    interface eno16777736    virtual_router_id    advert_int 1    nopreempt           #防止切换到从库后, Primary keepalived automatically switches back to the main library after recovery    authentication {        auth_type PASS        auth_pass 1111    }    track_script {        Chk_mysql    }         virtual_ipaddress {        192.168.244.10/24    }}

For a detailed description of the parameters of Keepalived, refer to: lvs+keepalived building Mycat High-availability load Balancing cluster

Among them,/etc/keepalived/check_mysql.sh content is as follows:

#!/bin/bash # # #判断如果上次检查的脚本还没执行完, then exit this execution if [' Ps-ef|grep-w ' $ "|grep-v" grep "|wc-l '-gt 2];then exit 0fi mysql_con= ' mysql-uroot-p123456 ' error_log= "/etc/keepalived/logs/check_mysql.err" # # #定义一个简单判断mysql是否可用的函数function Excute_ Query {${mysql_con}-E "SELECT 1;" 2>> ${error_log}} # # #定义无法执行查询, and when the MySQL service exception handler function service_error {echo-e "' Date" +%f%h:%m:%s "'-----My    SQL Service Error,now Stop keepalived-----">> ${error_log} service keepalived stop &>> ${error_log} echo "DB1 keepalived has stopped" |mail-s "DB1 keepalived has stopped, please deal with it in time! "[email protected] 2>> ${error_log} echo-e" \ n--------------------------------------------------------- \ n ">> ${error_log}} # # #定义无法执行查询, but MySQL service normal handler function query_error {echo-e" ' Date "+%f%h:%m:%s" '-----qu ery error, but MySQL service OK, retry after 30s-----">> ${error_log} sleep excute_query if [$?-ne 0 ];then echo-e "' Date" +%f%h:%m:%s "'-----StillCan ' t execute query-----">> ${error_log} # # #对DB1设置read_only属性 echo-e" ' Date "+%f%h:%m:%s" '--- --set read_only = 1 on DB1-----">> ${error_log} mysql_con-e" set global read_only = 1; " 2>> ${error_log} # # #kill掉当前客户端连接 echo-e "' Date" +%f%h:%m:%s "'-----Kill current Client thread--  ---">> ${error_log} rm-f/tmp/kill.sql &>/dev/null # # #这里其实是一个批量kill线程的小技巧 MYSQL_CON-E ' Select Concat ("Kill", ID, ";") from INFORMATION_SCHEMA.        Processlist where command= "Query" or command= "Execute" into outfile "/tmp/kill.sql"; ' Mysql_con-e "Source/tmp/kill.sql" Sleep 2 # # #给kill一个执行和缓冲时间 # # #关闭本机keepalived echo-e "' d Ate "+%f%h:%m:%s" '-----Stop keepalived-----">> ${error_log} service keepalived stop &>> ${er Ror_log} echo "DB1 keepalived has stopped" |mail-s "DB1 keepalived has stopped, please deal with it in time! "[email protected] 2>> ${error_log} echo-E "\ n---------------------------------------------------------\ n" >> ${error_log} else echo-e "' Date" +%f %h:%m:%s "'-----query OK after 30s-----" >> ${error_log} echo-e "\ n------------------------------------ ---------------------\ n ">> ${error_log} fi} # # #检查开始: Execute Query excute_queryif [$?-ne 0];then Service mysqld Stat US &>/dev/null if [$?-ne 0];then service_error Else query_error Fifi

Through specific query statements to determine the availability of the database service, if the query fails, then determine the state of the mysqld process itself, if not normal, then directly stop the keepalived of the current node, the VIP transfer to another node, if normal, then wait 30s, execute the query statement again, Or failed, set the current master node to Read_Only, kill the current client connection, and then stop the current keepalived.

Master2

[Email protected] ~]# vim/etc/keepalived/keepalived.conf

! Configuration File for keepalived vrrp_instance vi_1 {State    BACKUP    interface eno16777736    virtual_router_ ID    advert_int 1    authentication {        auth_type PASS        auth_pass 1111    }    Notify _master/etc/keepalived/notify_master_mysql.sh    #此条指令告诉keepalived发现自己转为MASTER后执行的脚本    virtual_ipaddress {        192.168.244.10/24    }}

Among them, the contents of/etc/keepalived/notify_master_mysql.sh are as follows:

#!/bin/bash## #当keepalived监测到本机转为MASTER状态时, execute the script change_log=/etc/keepalived/logs/state_change.logmysql_con= ' mysql-uroot-p123456 ' echo-e "' Date" +%f%h:%m:%s "'-----keepalived Change to MASTER-----" >> $change _log slave_        Info () {# # #统一定义一个函数取得slave的position, running, and log_file, etc. # #根据函数后面所跟参数来决定取得哪些数据 if [$ = Slave_status];then Slave_stat= ' ${mysql_con}-e "show slave status\g;" | Egrep-w "Slave_io_running| Slave_sql_running "' slave_io_running= ' echo $slave _stat|awk ' {print $} ' slave_sql_running= ' echo $slave _stat |awk ' {print $4} ' elif [$ = log_file-a $ = pos];then log_file_pos= ' ${mysql_con}-e ' show slave status\g; |egrep-w "master_log_file| read_master_log_pos| relay_master_log_file| Exec_master_log_pos "' master_log_file= ' echo $log _file_pos|awk ' {print $} ' read_master_log_pos= ' echo $log _f Ile_pos|awk ' {print $4} ' relay_master_log_file= ' echo $log _file_pos|awk ' {print $6} ' exec_master_log_pos= ' E Cho $log _fIle_pos|awk ' {print $8} ' fi} action () {# # #经判断 ' action should be & can be ' toggled ' echo-e ' date ' +%f%h:%m:%s ' '-----set re ad_only = 0 on DB2-----">> $change _log # # #解除read_only属性 ${mysql_con}-E" set global read_only = 0; " 2>> $change _log echo "DB2 keepalived to master State, online database switches to DB2" |mail-s "DB2 keepalived changes to master" [email& Nbsp;protected] 2>> $change _log echo-e "---------------------------------------------------------\ n" >>        $change _log} slave_info slave_statusif [$Slave _sql_running = Yes];then i=0 #一个计数器 slave_info log_file Pos # # #判断从master接收到的binlog是否全部在本地执行 (it is still not entirely certain that the main library has been chased from the library, because there is no guarantee that the Io_thread has no latency (the probability of falling from the library due to network transmission problems is very small) until [$Master _log_ File = $Relay _master_log_file-a $Read _master_log_pos = $Exec _master_log_pos] do if [$i-lt];then #将等待 Exec_pos catch up with Read_pos time limit is 10s echo-e "' Date" +%f%h:%m:%s "'-----relay_master_log_file= $Relay _master_log_fil e,exec_master_log_pos= $Exec _master_log_pos is behind master_log_file= $Master _log_file,read_master_log_pos= $Read _master_log_pos, wait ... ">> $        Change_Log #输出消息到日志, waiting for Exec_pos=read_pos i=$ (($i + 1)) Sleep 1 slave_info log_file POS Else Echo-e "The waits time is more than 10s,now force change. master_log_file= $Master _log_file read_master_log_pos= $Read _master_log_pos relay_master_log_file= $Relay _master_ Log_file exec_master_log_pos= $Exec _master_log_pos ">> $change _log action exit 0 fi D One action else Slave_info log_file pos echo-e "DB2 ' s slave status is Wrong,now force change. master_log_file= $Master _log_file read_master_log_pos= $Read _master_log_pos relay_master_log_file= $Relay _master_ Log_file exec_master_log_pos= $Exec _master_log_pos ">> $change _log Actionfi

The logic of the whole script is to let from the exec_master_log_pos as far as possible to catch up with the Read_master_log_pos, it gives the 10s limit, if still did not catch up, then directly set the Master2 (by lifting the Read_Only property), In fact, this is still open to question, such as the 10s limit is reasonable, or must be exec_master_log_pos=read_master_log_pos to switch.

When the master returns to normal, how to cut the VIP from Master2 back to Master1?

#!/bin/bash## #手动执行将主库切换回DB1的操作 mysql_con= ' mysql-uroot-p123456 ' echo-e "' Date" +%f%h:%m:%s "'-----Change to BACKUP Manually-----">>/etc/keepalived/logs/state_change.logecho-e" ' Date "+%f%h:%m:%s" '-----Set read_only = 1 on DB2-----">>/etc/keepalived/logs/state_change.log$mysql_con-e" set global read_only = 1; "  2>>/etc/keepalived/logs/state_change.log # #kill掉当前客户端连接echo-E "' Date" +%f%h:%m:%s "'-----Kill current Client Thread-----">>/etc/keepalived/logs/state_change.logrm-f/tmp/kill.sql &>/dev/null### Here is actually a small tip for a bulk kill thread $mysql_con-e ' select Concat ("Kill", ID, ";") from INFORMATION_SCHEMA. Processlist where command= "Query" or command= "Execute" into outfile "/tmp/kill.sql"; ' $mysql _con-e "Source/tmp/kill.sql" 2>>/etc/keepalived/logs/state_change.logsleep 2 # # #给kill一个执行和缓冲时间 # # #确保DB1 Has been caught, the following repl for the copy of the account used,-H followed by DB1 's intranet iplog_file_pos= ' Mysql-urepl-pmysql-h192.168.244.145-e "show slave status\g;" | Egrep-w "master_log_file| Read_master_log_pos| relay_master_log_file| Exec_master_log_pos "' master_log_file= ' echo $log _file_pos|awk ' {print $} ' read_master_log_pos= ' echo $log _file_pos| awk ' {print $4} ' relay_master_log_file= ' echo $log _file_pos|awk ' {print $6} ' exec_master_log_pos= ' echo $log _file_pos| awk ' {print $8} ' until [$Read _master_log_pos = $Exec _master_log_pos-a $Master _log_file = $Relay _master_log_file]do E Cho-e "' Date" +%f%h:%m:%s "'-----DB1 exec_master_log_pos ($exec _pos) is behind Read_master_log_pos ($read _pos), wait:    .... ">>/etc/keepalived/logs/state_change.log Sleep 1done # # #然后解除DB1的read_only属性echo-E" ' Date "+%f%h:%m:%s" ' -----Set read_only = 0 on DB1-----">>/etc/keepalived/logs/state_change.logssh 192.168.244.145 ' Mysql-uroot-p1 23456-E "Set Global read_only = 0;" &&/etc/init.d/keepalived start ' 2>>/etc/keepalived/logs/state_change . Log # # #重启DB2的keepalived使VIP漂移到DB1echo-E "' Date" +%f%h:%m:%s "'-----make VIP move to DB1-----" >>/etc/keepaLived/logs/state_change.log/sbin/service keepalived Restart &>>/etc/keepalived/logs/state_change.log echo "DB2 keepalived to backup status, online database switch to DB1" |mail-s "DB2 keepalived Change to Backup" [email protected] 2>>/E Tc/keepalived/logs/state_change.log echo-e "--------------------------------------------------\ n" >>/etc/ Keepalived/logs/state_change.log

Summarize:

1./etc/keepalived/check_mysql.sh and/etc/keepalived/notify_master_mysql.sh must have executable permissions.

If the former does not have permission to execute, then Master1 will not bind VIP, the log directly prompts the following information:

May 14:37:09 Master1 keepalived_vrrp[3165]: vrrp_instance (vi_1) Entering BACKUP statemay-14:37:09 master1 keepalived _VRRP[3165]: VRRP sockpool: [IfIndex (2), Proto (), unicast (0), FD (10,11)]may 14:37:50 Master1 keepalived_vrrp[3165] : Vrrp_instance (Vi_1) now on FAULT state

2. There are two modes in keepalived, namely the Master->backup mode and the Backup->backup mode, what is the difference between the two modes?

In Master->backup mode, once the main library is down, the virtual IP will automatically drift to the slave library, and when the main library is repaired, after the keepalived is started, the virtual IP will be grabbed, even if you set up nopreempt (not preemption) way to preempt the IP action will also occur. In Backup->backup mode, when the main library is down, the virtual IP will automatically drift to the slave library, and when the master recovers and restarts the keepalived service, it will not preempt the new master's virtual IP, even if the priority level is higher than the priority from the library and does not preempt the IP. In order to reduce the number of IP drift, we usually take the repaired main library as a repository for the new main library in production.

3. This article is based on the MySQL main library high availability-dual main single-live fault automatic switching scheme, combined with their own understanding of MySQL collation. The original script directly executes a bit of a problem, the idea is a bit flawed, so combined with their own actual environment, re-modified a.

4. In the course of testing, the following points need to be noted:

The logic of the 1> master1 detection script is that if the MySQL service is not available, the keepalived is turned off through the service keepalived Stop command, but during the actual test, there is even a service keepalived Stop command, the keepalived process still does not stop, causing the MySQL service to be unavailable, but the VIP does not drift to master2.

Optimization scheme: After the service keepalived stop, wait for 5s, again detect the status of Keepalived, if the keepalived is not closed, then kill directly.

The 2> keepalived log is output to/var/log/messages by default, which is not easy to see. How do I customize the keepalived log output file?

If the service is started, modify the/etc/sysconfig/keepalived file

If not, specify the above parameters when you start, such as:

Modify/etc/syslog.conf

# keepalived-s 0 local0.*                                                /var/log/keepalived.log

Restart Syslog

RHEL 5&6:service syslog Restart

RHEL 7:service rsyslog Restart

High-availability clusters based on keepalived to build MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.