High Availability of MySQL Data

Last Update:2018-05-30 Source: Internet

Author: User

Tags mysql code

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For most applications, MySQL is the most critical data storage center. Therefore, we have to deal with the problem of how to make MySQL provide the HA service. When the master

For most applications, MySQL is the most critical data storage center. Therefore, we have to deal with the problem of how to make MySQL provide the HA service. When the master is running, we need to think about how to ensure that data is not lost as much as possible. Here, I will talk about the MySQL proxy and toolsets work we have done during this period and the MySQL HA solution we will use in the project at this stage and in the future.

Replication

To ensure that MySQL Data is not lost, replication is a good solution, and MySQL also provides a powerful replication mechanism. We only need to know That replication adopts the asynchronous mode for performance consideration, that is, the written data is not synchronized to the slave. If the master is on the machine at this time, we may still face the risk of data loss.

To solve this problem, we can use semi-synchronous replication. The principle of semi-synchronous replication is very simple. When the master completes a transaction, it will not return until at least one slave supporting semi-synchronous confirms that it has received the event and writes it to relay-log. In this way, even if the master is on the machine, at least one slave can obtain the complete data.

However, semi-synchronous does not guarantee that data will not be lost. If the master crashes when completing the transaction and sending it to slave, it may still cause data loss. Compared with traditional asynchronous replication, semi-synchronous replication can greatly improve data security. More importantly, it is not slow. MHA authors all said they used semi-synchronous in facebook's production environment (here ), so I don't really need to worry about its performance, unless your business volume has completely exceeded facebook or google. As mentioned in this article, Loss-Less Semi-Synchronous replication has been used since MySQL 5.7, so the probability of data Loss is very small.

If you really want to ensure that data will not be lost, a better method at this stage is to use gelera, a MySQL cluster solution, which ensures that data will not be lost by writing three copies of the policy at the same time. I have no experience in using gelera, but I know that some companies in the industry have used it in the production environment, and the performance should not be a problem. But gelera is highly invasive to MySQL Code and may not be suitable for some code cleaner students :-)

We can also use drbd to replicate MySQL Data. The MySQL official document provides a detailed description, but I did not use this solution. The author of MHA wrote some questions about using drdb ,, for reference only.

In subsequent projects, I will give priority to the semi-synchronous replication solution. If the data is really important, we will consider using gelera.

Monitor

We mentioned above that the replication mechanism is used to ensure that the data on the master node is not lost as much as possible, but we cannot wait for several minutes for the master node to know the problem. Therefore, a set of good monitoring tools is essential.

When the master node is down, the monitor can quickly detect and perform subsequent processing, such as notifying the Administrator by email or notifying the daemon to quickly perform failover.

Generally, we use keepalived or heartbeat to monitor a service, so that when the master is on the machine, we can easily switch to the slave machine. However, they still cannot immediately detect service unavailability. My company currently uses the keepalived method, but later I prefer to use zookeeper to solve the monitor and failover of the entire MySQL cluster.

For any MySQL instance, we have a corresponding agent program. The agent and the MySQL instance are placed on the same machine, and the ping command is periodically sent to the MySQL instance to check their availability, at the same time, the agent is mounted to zookeeper through ephemeral. In this way, we can know whether MySQL is a machine, mainly in the following situations:

In the above three cases, we can all think that there is a problem with the MySQL machine, and zookeeper can immediately perceive it. The agent is disconnected from zookeeper. zookeeper triggers the corresponding children changed event, and the Control Service that monitors the event can handle it accordingly. For example, in the first two cases, the Control Service can automatically perform a failover, but in the third case, it may not be processed, wait for related services such as crontab or supersivord on the machine to automatically restart the agent.

The advantage of zookeeper is that it can easily monitor the entire cluster, and can instantly obtain the change information of the entire cluster and trigger the services interested in corresponding Event Notifications, coordinate multiple services for related processing at the same time. However, keepalived or heartbeat cannot or is too troublesome.

The problem with zookeeper is that it is complicated to deploy, and if failover is performed, it is also troublesome to obtain the latest database address for the application.

For deployment problems, we need to ensure that a MySQL instance works with an agent. Fortunately, with docker, we are very simple. Zookeeper is not used to change the address of the second database. we can notify the application to dynamically update the configuration information, VIP, or use proxy to solve the problem.

Although zookeeper has many advantages, zookeeper may not be the best choice if your business is not complex, such as only one master and one slave. Maybe keepalived is enough.

Failover

Through monitor, we can easily perform MySQL monitoring, and notify the corresponding service to perform failover processing after MySQL is on the machine. Suppose there is such a MySQL cluster, a is the master, B and c are their slave. When a crashes, we need to perform a failover. So which one of B and c is selected as the new master?

The principle is very simple. Which slave has the most recent original master data, choose which one as the new master. We can use the show slave status command to find out which slave has the latest data. We only need to compare the two key fields Master_Log_File and Read_Master_Log_Pos. These two values indicate the location of the binlog file that the slave reads to the master. The larger the index value of the binlog, the larger the pos, then the slave can be upgraded to the master. We will not discuss the possibility that multiple slave instances will be upgraded to the master node.

In the previous example, if B is promoted to master, we need to re-point c to the new master B to start replication. We reset the c MASTER by changing the master TO, but how do we know which file from the binlog of B and which position TO Start copying?

GTID

To solve this problem, MySQL 5.6 introduced the concept of GTID, that is, uuid: gid. uuid is the uuid of MySQL server, which is globally unique, gid is an incremental transaction id. With these two items, we can uniquely identify a transaction recorded in the binlog. With GTID, we can easily perform failover processing.

In the preceding example, assume that the last GTID of a read by B is listen: 23, while that of c is 3E11FA47-71CA-11E1-9E33-C80AA9429562: 15. When c points to the new master B, we can know through GTID that as long as the GTID in binlog B is found to be 3E11FA47-71CA-11E1-9E33-C80AA9429562: 15, then c Can Start copying from its next event location. Although the method of searching binlog is still sequential, it is a little inefficient and violent, but it is much easier to guess which filename and position it is.

Google has also installed a Global Transaction ID patch for a long time, but it only uses an incremental integer. LedisDB uses its ideas to implement failover, however, google seems to be migrating to MariaDB gradually now.

MariaDB's GTID implementation is different from MySQL 5.6, which is actually troublesome. For my MySQL tool set go-mysql, this means you need to write two sets of different codes to handle GTID. Check whether MariaDB is supported in the future.

Pseudo GTID

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More