Automatic failover of MHA Based on consul architecture, consulmha

Source: Internet
Author: User
Tags dns entry

Automatic failover of MHA Based on consul architecture, consulmha
Introduction

For a long time, we have not enabled the masterha_manager automatic switch script online, because the database cannot be accessed due to network jitter (network cable, Cabinet switch instability. for example, restarting the NIC of the machine where the detection script is located does not indicate a problem with the database. Therefore, we cannot judge that the database is inaccessible only by checking the node.

Fortunately, through consul (because consul provides dns interfaces, I prefer to use consul instead of etcd) Cluster Features, we add a multi-point detection mechanism in the environment of n clusters, if more than half of the Detection Points detect database problems, we think the database is inaccessible and then we start to callmasterha_managerSwitch the script, as shown in:

       <checkmysql>         <checkmysql>         <checkmysql>            |                   |                     |       +---------+          +---------+          +---------+       | consul1 |          | consul2 |          | consul3 |       +---------+          +---------+          +---------+                  \             |               /                   \            |              /                    \           |             /                     \          |            /                     +----------------------+                     |   http api && acl    |                     +----------------------+                                |                                |                     +----------------------+                     | consul-template      | ----> < mysqlxxx.tpl >  --->  <mysqlxxx.conf>                     +----------------------+                                                                                  |                                                                      +--------------------------+                                                                        | masterha_manager_consul  |                                                                      +--------------------------+

 

checkmysqlDeployment to each serverconsul serverSo that we can check whether MySQL is normal at multiple points. If yes,checkmysqlA key with a value of 1 is set:mysql/mysqlxxxx/node-consulOtherwise, the value is 0, wherenode-consulThe default value is the hostname of the current host.

checkmysqlAfter detection, we use the consul-template tool based on the template file.mysqlxxx.tplTo listen for changes to all keys. If there is any change, the configuration will be generated.mysqlxxxx.confAnd then callmasterha_manager_consulThe script starts switching.

Inmasterha_manager_consulThe method is rewritten in the script.MHA::HealthCheck::wait_until_unreachableTo avoid infinite loop detection. If less than half of the Detection Points think that the database is abnormal, the system will exit this call. Otherwise, the sub-process is enabled to start switching.

Note:

masterha_manager_consulIt is modified based on MHA v0.5.6 and automatically switches between on the current day and on the next day by default.nightOption to control this function. More than oneconsul serverWe recommend that you deploy it to different switches or cabinets.

Instructions for use

For the code, see the overall structure of mha_manager_consul as follows:

mha_manager_consul├── bin│   ├── checkmysql│   └── masterha_manager_consul├── conf│   ├── db.cnf│   └── template-config├── consul│   ├── acl│   │   ├── policy.ano│   │   └── policy.key│   ├── conf│   │   └── consul.conf│   └── conf.d│       └── server.json├── README.md└── template    └── mysql3308.tpl
Test Environment

Continue to use the previous test environment:

Ip OS Hostname Version
10.0.21.5 Centos 1, 6.5 Cz-test1 Consul 0.8 v
10.0.21.7 Centos 1, 6.5 Cz-test2 Consul 0.8 v
10.0.21.17 Centos 1, 6.5 Cz-test3 Consul 0.8 v

All the operations below are assumed to have been installedconsul cluster.

Remarks

RunningcheckmysqlPreviously, we needed to set an acl Policy to prevent sensitive consul information from being accessed by others.tokenThe parameter isconsulIn the main configuration fileacl_master_tokenOption, filepolicy.anoIt restricts anonymous user access.mysql/*Key strategies,policy.keyIs set to allow accessmysql.*The token generated here is the key permission.dcb5b583-cd36-d39d-2b31-558bebf86502You can access the consul acl to learn more about access control.

#curl -X PUT --data @policy.ano http://localhost:8500/v1/acl/update?token=e95597e0-4045-11e7-a9ef-b6ba84687927{"ID":"anonymous"}#curl -X PUT --data @policy.key http://localhost:8500/v1/acl/update?token=e95597e0-4045-11e7-a9ef-b6ba84687927{"ID":"dcb5b583-cd36-d39d-2b31-558bebf86502"}
Checkmysql

In eachconsul serverRun the script on the node.tokenThe parameter is the result of the preceding acl,tagIsdb.confRun the following command to start the instance in Configuration:

perl checkmysql --conf db.cnf --verbose --tag mysql3308 --token dcb5b583-cd36-d39d-2b31-558bebf86502[2017-06-08T10:09:14] mysql/mysql3308/cz-test2 with value 1 no change[2017-06-08T10:09:15] mysql/mysql3308/cz-test2 with value 1 no change

cz-test2Indicates that the current host name iscz-test2, Corresponding tonode-consul.

Remarks

If yourMySQL masterIs to provide services through vip,db.confIt is best to set the host option in the configuration to the vip address.

Consul-template

After checkmysql updates the related key of consul, if any checkmysql changes the key value, consul-template generates a new mysqlxxx according to the template file. conf file, and then call the masterha_manager_consul script. For details about the configuration of consul-template, seetemplate-config; Run the following command to start:

# consul-template -config config 2017/05/25 10:11:13 [DEBUG] (logging) enabling syslog on LOCAL5

mysqlxxxx.tplThe template file is as follows:

# node3308cz-test1:1cz-test2:1cz-test3:1

If less than half of the monitoring sites find MySQL exceptions,consul-templatePrint the following message:

[2017-06-08T10:24:15] status ok, skip switch..

Otherwise, the system prints the error information and starts to callmasterha_manager_consulScript:

[2017-05-25T10:24:48] status error, need switch..Wed May 24 10:24:48 2017 - [info] Reading default configuration from /etc/masterha/app_default.cnf........
Conf. d/server. json

For details, see address = "consul. service. consul: 8500 "option; in case of network fluctuations, if the address option is configured with only one consul server ip address, consul-template cannot be connected to the consul server to monitor the corresponding key value, although the consul-template has the retry function, it is difficult to obtain the relevant key value normally when a single ip address is used. conf. d/server. json configuration uses the ip addresses of each consul server as a dns entry, as shown below:

# dig @10.0.21.5 consul.service.consul............;; QUESTION SECTION:;consul.service.consul.     IN  A;; ANSWER SECTION:consul.service.consul.  0   IN  A   10.0.21.7consul.service.consul.  0   IN  A   10.0.21.5consul.service.consul.  0   IN  A   10.0.21.17

A single consul server exception will automatically jump to the normal consul-server.

Master-slave switchover Test

Let's simply shut down the master instance and check the output status between tools.

Disable master

After the master is disabled,checkmysqlThe script starts to update the status, which is called in more than half of the cases.masterha_manager_consulScript for master-slave switching:checkmysqlScript output, start to increase the key value to 0

[2017-06-08T18:16:43] mysql/mysql3308/cz-test2 with value 1 no changeDBI connect('mysql_read_default_file=./db.cnf;mysql_read_default_group=mysql3308','',...) failed: Can't connect to MySQL server on '10.0.21.7' (111) at checkmysql line 56[2017-06-08T18:16:44] set 0 with key mysql/mysql3308/cz-test2 okDBI connect('mysql_read_default_file=./db.cnf;mysql_read_default_group=mysql3308','',...) failed: Can't connect to MySQL server on '10.0.21.7' (111) at checkmysql line 56[2017-06-08T18:16:45] mysql/mysql3308/cz-test2 with value 0 no change

mysql3308.confThe configuration file is changed to the following:

# node3308cz-test1:0cz-test2:0cz-test3:0

consul-templateIt is shown as follows:

# Consul-template-config 12:11:13 [DEBUG] (logging) enabling syslog on LOCAL5 [2017-05-24T12: 16: 48] status error, need switch .. # The Script determines that more than half of the databases are inaccessible to Wed Jun 08 12:16:48 2017-[info] Reading default configuration from/etc/masterha/app_default.cnf .. wed Jun 08 12:16:48 2017-[info] Reading application default configuration from/etc/masterha/app_56.conf .. wed Jun 08 12:16:48 2017-[info] Updating application default configuration from/usr/bin/init_conf_loads ......

  

If there are no more than half, consul-template displays the following:

[2017-06-08T12:24:15] status ok, skip switch..
MHA switching log

The mha switch log contains the following information, and the log file depends on the specific configuration of mha:

Wed Jun 08 12:45:37 2017 - [info] Starting master failover..Wed Jun 08 12:45:37 2017 - [info] From:10.0.21.7(10.0.21.7:3308) (current master) +--10.0.21.17(10.0.21.17:3308)To:10.0.21.17(10.0.21.17:3308) (new master)......Master failover to 10.0.21.17(10.0.21.17:3308) completed successfully.Wed Jun 08 12:45:41 2017 - [info] Sending mail..

  

Summary

In generalconsulThe architecture is relatively cumbersome, not as simple and convenient as a single node. However, for core databases, consistency should be put first, while multi-point detection greatly improves the switching mechanism. andmasterha_managerThe script itself only performs cyclic detection. After three errors occur (the interval increases), the switchover starts. When the network fluctuates, the switch fails, or the Database Host is busy, it may cause some unexpected operations. Therefore, multi-point detection avoids such instability.consul clusterAfter the deployment is complete, it can also be used for other businesses that require consistent judgment, so you don't have to worry too much about tedious issues.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.