Automatic failover of MHA Based on consul architecture, consulmha
Introduction
For a long time, we have not enabled the masterha_manager automatic switch script online, because the database cannot be accessed due to network jitter (network cable, Cabinet switch instability. for example, restarting the NIC of the machine where the detection script is located does not indicate a problem with the database. Therefore, we cannot judge that the database is inaccessible only by checking the node.
Fortunately, through consul (because consul provides dns interfaces, I prefer to use consul instead of etcd) Cluster Features, we add a multi-point detection mechanism in the environment of n clusters, if more than half of the Detection Points detect database problems, we think the database is inaccessible and then we start to callmasterha_manager
Switch the script, as shown in:
<checkmysql> <checkmysql> <checkmysql> | | | +---------+ +---------+ +---------+ | consul1 | | consul2 | | consul3 | +---------+ +---------+ +---------+ \ | / \ | / \ | / \ | / +----------------------+ | http api && acl | +----------------------+ | | +----------------------+ | consul-template | ----> < mysqlxxx.tpl > ---> <mysqlxxx.conf> +----------------------+ | +--------------------------+ | masterha_manager_consul | +--------------------------+
checkmysql
Deployment to each serverconsul server
So that we can check whether MySQL is normal at multiple points. If yes,checkmysql
A key with a value of 1 is set:mysql/mysqlxxxx/node-consul
Otherwise, the value is 0, wherenode-consul
The default value is the hostname of the current host.
checkmysql
After detection, we use the consul-template tool based on the template file.mysqlxxx.tpl
To listen for changes to all keys. If there is any change, the configuration will be generated.mysqlxxxx.conf
And then callmasterha_manager_consul
The script starts switching.
Inmasterha_manager_consul
The method is rewritten in the script.MHA::HealthCheck::wait_until_unreachable
To avoid infinite loop detection. If less than half of the Detection Points think that the database is abnormal, the system will exit this call. Otherwise, the sub-process is enabled to start switching.
Note:
masterha_manager_consul
It is modified based on MHA v0.5.6 and automatically switches between on the current day and on the next day by default.night
Option to control this function. More than oneconsul server
We recommend that you deploy it to different switches or cabinets.
Instructions for use
For the code, see the overall structure of mha_manager_consul as follows:
mha_manager_consul├── bin│ ├── checkmysql│ └── masterha_manager_consul├── conf│ ├── db.cnf│ └── template-config├── consul│ ├── acl│ │ ├── policy.ano│ │ └── policy.key│ ├── conf│ │ └── consul.conf│ └── conf.d│ └── server.json├── README.md└── template └── mysql3308.tpl
Test Environment
Continue to use the previous test environment:
Ip |
OS |
Hostname |
Version |
10.0.21.5 |
Centos 1, 6.5 |
Cz-test1 |
Consul 0.8 v |
10.0.21.7 |
Centos 1, 6.5 |
Cz-test2 |
Consul 0.8 v |
10.0.21.17 |
Centos 1, 6.5 |
Cz-test3 |
Consul 0.8 v |
All the operations below are assumed to have been installedconsul cluster
.
Remarks
Runningcheckmysql
Previously, we needed to set an acl Policy to prevent sensitive consul information from being accessed by others.token
The parameter isconsul
In the main configuration fileacl_master_token
Option, filepolicy.ano
It restricts anonymous user access.mysql/*
Key strategies,policy.key
Is set to allow accessmysql.*
The token generated here is the key permission.dcb5b583-cd36-d39d-2b31-558bebf86502
You can access the consul acl to learn more about access control.
#curl -X PUT --data @policy.ano http://localhost:8500/v1/acl/update?token=e95597e0-4045-11e7-a9ef-b6ba84687927{"ID":"anonymous"}#curl -X PUT --data @policy.key http://localhost:8500/v1/acl/update?token=e95597e0-4045-11e7-a9ef-b6ba84687927{"ID":"dcb5b583-cd36-d39d-2b31-558bebf86502"}
Checkmysql
In eachconsul server
Run the script on the node.token
The parameter is the result of the preceding acl,tag
Isdb.conf
Run the following command to start the instance in Configuration:
perl checkmysql --conf db.cnf --verbose --tag mysql3308 --token dcb5b583-cd36-d39d-2b31-558bebf86502[2017-06-08T10:09:14] mysql/mysql3308/cz-test2 with value 1 no change[2017-06-08T10:09:15] mysql/mysql3308/cz-test2 with value 1 no change
cz-test2
Indicates that the current host name iscz-test2
, Corresponding tonode-consul
.
Remarks
If yourMySQL master
Is to provide services through vip,db.conf
It is best to set the host option in the configuration to the vip address.
Consul-template
After checkmysql updates the related key of consul, if any checkmysql changes the key value, consul-template generates a new mysqlxxx according to the template file. conf file, and then call the masterha_manager_consul script. For details about the configuration of consul-template, seetemplate-config
; Run the following command to start:
# consul-template -config config 2017/05/25 10:11:13 [DEBUG] (logging) enabling syslog on LOCAL5
mysqlxxxx.tpl
The template file is as follows:
# node3308cz-test1:1cz-test2:1cz-test3:1
If less than half of the monitoring sites find MySQL exceptions,consul-template
Print the following message:
[2017-06-08T10:24:15] status ok, skip switch..
Otherwise, the system prints the error information and starts to callmasterha_manager_consul
Script:
[2017-05-25T10:24:48] status error, need switch..Wed May 24 10:24:48 2017 - [info] Reading default configuration from /etc/masterha/app_default.cnf........
Conf. d/server. json
For details, see address = "consul. service. consul: 8500 "option; in case of network fluctuations, if the address option is configured with only one consul server ip address, consul-template cannot be connected to the consul server to monitor the corresponding key value, although the consul-template has the retry function, it is difficult to obtain the relevant key value normally when a single ip address is used. conf. d/server. json configuration uses the ip addresses of each consul server as a dns entry, as shown below:
# dig @10.0.21.5 consul.service.consul............;; QUESTION SECTION:;consul.service.consul. IN A;; ANSWER SECTION:consul.service.consul. 0 IN A 10.0.21.7consul.service.consul. 0 IN A 10.0.21.5consul.service.consul. 0 IN A 10.0.21.17
A single consul server exception will automatically jump to the normal consul-server.
Master-slave switchover Test
Let's simply shut down the master instance and check the output status between tools.
Disable master
After the master is disabled,checkmysql
The script starts to update the status, which is called in more than half of the cases.masterha_manager_consul
Script for master-slave switching:checkmysql
Script output, start to increase the key value to 0
[2017-06-08T18:16:43] mysql/mysql3308/cz-test2 with value 1 no changeDBI connect('mysql_read_default_file=./db.cnf;mysql_read_default_group=mysql3308','',...) failed: Can't connect to MySQL server on '10.0.21.7' (111) at checkmysql line 56[2017-06-08T18:16:44] set 0 with key mysql/mysql3308/cz-test2 okDBI connect('mysql_read_default_file=./db.cnf;mysql_read_default_group=mysql3308','',...) failed: Can't connect to MySQL server on '10.0.21.7' (111) at checkmysql line 56[2017-06-08T18:16:45] mysql/mysql3308/cz-test2 with value 0 no change
mysql3308.conf
The configuration file is changed to the following:
# node3308cz-test1:0cz-test2:0cz-test3:0
consul-template
It is shown as follows:
# Consul-template-config 12:11:13 [DEBUG] (logging) enabling syslog on LOCAL5 [2017-05-24T12: 16: 48] status error, need switch .. # The Script determines that more than half of the databases are inaccessible to Wed Jun 08 12:16:48 2017-[info] Reading default configuration from/etc/masterha/app_default.cnf .. wed Jun 08 12:16:48 2017-[info] Reading application default configuration from/etc/masterha/app_56.conf .. wed Jun 08 12:16:48 2017-[info] Updating application default configuration from/usr/bin/init_conf_loads ......
If there are no more than half, consul-template displays the following:
[2017-06-08T12:24:15] status ok, skip switch..
MHA switching log
The mha switch log contains the following information, and the log file depends on the specific configuration of mha:
Wed Jun 08 12:45:37 2017 - [info] Starting master failover..Wed Jun 08 12:45:37 2017 - [info] From:10.0.21.7(10.0.21.7:3308) (current master) +--10.0.21.17(10.0.21.17:3308)To:10.0.21.17(10.0.21.17:3308) (new master)......Master failover to 10.0.21.17(10.0.21.17:3308) completed successfully.Wed Jun 08 12:45:41 2017 - [info] Sending mail..
Summary
In generalconsul
The architecture is relatively cumbersome, not as simple and convenient as a single node. However, for core databases, consistency should be put first, while multi-point detection greatly improves the switching mechanism. andmasterha_manager
The script itself only performs cyclic detection. After three errors occur (the interval increases), the switchover starts. When the network fluctuates, the switch fails, or the Database Host is busy, it may cause some unexpected operations. Therefore, multi-point detection avoids such instability.consul cluster
After the deployment is complete, it can also be used for other businesses that require consistent judgment, so you don't have to worry too much about tedious issues.