MySQL high availability solution: MHA-based automatic failover Cluster
Comparison of common high availability solutions for MySQL (from the official manual)
Only MySQL Cluster and DRBD + Heartbeat are available for automatic database failover. These two HA solutions do not depend on Replication.
However, the configuration and maintenance of MySQL Cluster (NDB) is complex, and is not as stable and easy to use as Replication. Most companies may not consider this solution. The additional performance consumption of DRBD is relatively large, it is about 20%-30%, which greatly reduces availability.
Therefore, designing an HA solution based on Replication is the best choice for us.
MySQL supports unidirectional and asynchronous replication. One server acts as the master server during the replication process, and one or more other servers act as slave servers. MySQL5.5 introduces a semi-synchronous replication feature that ensures data consistency and redundancy between the master server and at least one slave server in the access chain.
Because of the Asynchronization of MySQL Replication (up to semi-sync), the HA architecture of simple MySQL Replication + Heartbeat can only complete IP failover, but cannot complete database failover, that is, Data Consistency cannot be guaranteed.
How to ensure data consistency in the replication architecture during failover
1. Find the most successful synchronization slave server (that is, the slave server closest to the master server data ).
2. If the host is still accessible, retrieve the data difference between the latest slave and the host from the master server.
3. Operate on each slave server to determine which events they lack and supplement them separately.
4. After upgrading the latest slave server to the master server, point the other slave server back to the new master server.
The above MHA can be implemented.
MHA (Master HA) is an open-source MySQL high-availability tool that can implement automated Master server failover Based on MySQL Master-slave replication.
However, although MHA tries to save binary logs from the down master server, it is not always feasible. For example, if the master server hardware fails or cannot be accessed through ssh, MHA cannot save binary logs and only performs failover to lose the latest data.
Semi-synchronous replication can greatly reduce the risk of data loss. MHA can be combined with semi-synchronous replication. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, therefore, they maintain consistency with each other.
Implementation Scheme of Auto Failover Cluster
On the basis of Replication, MHA can help us implement database failover, but for a complete set of automatic failover clusters, this is far from enough. We also need to implement the following requirements:
1. When the databases in the cluster fail over, the virtual IP addresses that provide external services are also transferred;
2. The MHA management process needs to run in the form of a daemon and a monitoring mechanism to ensure the normal operation of the MHA management process;
3. There is a monitoring mechanism to ensure that when the host fails, MHA can determine the successful Failover;
4. When the faulty host recovers, it can return to the cluster and become a new Server Load balancer instance to automatically implement re-synchronization;
5. Because the backup policy is different between the host and the slave, after the Failover, the scheduling in cron is automatically adjusted (such as full backup ).
The complete automatic failover cluster includes:
MySQL Replication implementation: Data Synchronization
MHA (MasterHA) Implementation: Heartbeat detection and database failover
Implementation of Heartbeat's IP management module: IP failover
Customized management and monitoring scripts implemented by Perl: automatic re-synchronization and normal Cluster operation
Architecture Diagram
After the completion of the automatic failover cluster can meet our needs according to the plan, the test results in the internal test environment are as follows (the database server is a 4-Core + 8 GB memory vmwarevm, install centos5.8 + MySQL5.5.21): test scenario
Failover time consumption
Ensure data integrity
Automatic data re-synchronization
Host MySQL service stop response
<15 seconds
Yes
Yes
Host CentOS stop response
<40 seconds
Yes
Yes
Replication delay (500 mb of logs not transferred) + MySQL service stop response
4.2 ~ 4.5 m
Yes
Yes
Replication latency (unexecuted log 500 M) + MySQL service stop response
1.9 ~ 2.1 m
Yes
Yes
Replication delay (500 mb of unsent logs) + host CentOS stop response
<40 seconds
Data Loss
Unable to automatically re-sync due to data loss
Replication delay (unexecuted log 500 M) + host CentOS stop response)
2.5 ~ 2.7 m
Yes
It is the current automatic failover cluster that has been applied in our production environment for nearly half a year, and has conducted multiple switchover drills to run well.