Distributed data Storage-MySQL master-slave replication high-availability scenario

Source: Internet
Author: User
Tags diff failover zookeeper node server

In the previous articles, the high-availability scheme for MySQL databases, master-slave replication, latency generation of master-slave replication, latency Detection and latency solutions (not fundamentally addressed), this master-slave replication scheme guarantees data redundancy while simultaneously reading and writing to share system pressure but is not a highly available solution. Because the master node in the master-slave node is still a single point, the write failure in the application is caused by the primary node outage. Dual master replication is a good way to avoid a single point of failure of the primary node, but does not provide a unified access portal to achieve load balancing, if the master is down, you need to manually switch to another master, but not automatically switch. This article is to analyze the high availability of master-slave replication.

First, the basic concept introduction

    • Keepalived/heartbeat

is a software solution based on VRRP (Virtual Routing Redundancy Protocol) that can be used to achieve high availability of services to avoid a single point of failure. Keepalived is generally used for lightweight high availability, and does not require shared storage, typically used between two nodes, common with lvs+keepalived, nginx+keepalived combinations.

    • Mha/mmm

MHA is a MySQL failover scheme to ensure high availability of database systems. During downtime (typically within 10-30 seconds), complete failover, deploy MHA to avoid master-slave consistency issues, ease of installation, and no changes to existing deployments. Also support online switching, from the current running master switch to a new master, only a short period of time (0.5-2 seconds), this time only blocking the write operation, does not affect the read operation, convenient host hardware maintenance. On systems with high availability, data consistency requirements, MHA provides useful functionality to meet maintenance needs almost uninterrupted.

    • Lvs

(Linux virtual Server) is a highly available virtual server cluster system. It is mainly used for multi-server load balancing for the network layer. In a server cluster system built by LVS, the load balancing layer of the front end is called the director server, and the server group layer that serves the backend is called real server.

LVS has three modes of operation, namely Dr (direct Routing), TUN (tunneling IP Tunneling), NAT (network address translation). Where the Tun mode can support more real servers, but requires all server support IP Tunneling Protocol, Dr can also support the equivalent of real server, but need to ensure that the Director Server virtual network card and physical network card in the same network segment; Nat extensibility is limited, It is not possible to support more real servers because all request packages and reply packets require the Director Server to parse and regenerate, affecting efficiency. At the same time, LVS load balancer has 10 scheduling algorithms, namely RR, WRR, LC, WLC, LBLC, LBLCR, dh, sh, sed, NQ.

Second, high-availability scheme based on master-slave replication

1, dual-node Master + keepalived/heartbeat

Two nodes can be in a simple one -to-one mode, or dual-master mode, and placed in the same VLAN , after the master node fails, the use of keepalived/ Heartbeat's high-availability mechanism enables fast switching to the slave node.


    • uses keepalived as a high-availability scheme, it is best to set the two nodes to backup mode to avoid conflicts that can occur when the same data is written to two nodes because of an unexpected situation (such as a brain fissure ).
    • The
    • sets the auto_increment_increment (self-increment start value) and auto_increment_offset (increment step) of two nodes to different values. The purpose is to avoid accidental downtime of the master node, some binlog may not be copied to the slave on time to be applied, resulting in slave the new write data of the self-increment and the original master conflict, so it was staggered from the beginning; of course, If there is a suitable fault-tolerant mechanism to resolve the master-slave ID conflict, you can also not do this;
    • slave node server configuration is not too bad, or it is more likely to cause replication delays. As a slave server for a hot standby node, the hardware configuration cannot be lower than the master node;
    • If you are sensitive to latency issues, consider using the MARIADB branch version, or go directly to the latest MySQL 5.7 version. The use of multi-threaded replication can greatly reduce replication latency, and another alternative for which
    • is particularly sensitive to replication latency is the use of semi sync replication (which is called semi-synchronous replication) or the PXC scheme that is referred to later, with virtually no delay. However, the transaction concurrency performance will have a small degree of loss, need a comprehensive assessment to decide again; the detection mechanism of
    • keepalived needs to be properly perfected, not just to check if the mysqld process is alive, or if the MySQL service port is available, should also do further data writing or operation detection, to determine the response time, if the threshold is exceeded, you can start the switching mechanism; the
    • keepalived determines the latency of the slave when it is finalized to switch. The rules need to be set in advance in order to decide which strategies to switch or wait for in the event of a delay. Direct switchover may occur because replication delays some data cannot be queried and repeated writes,
    • keepalived or heartbeat itself cannot solve the problem of brain fissure , so in the case of service anomaly judgment, you can adjust the judgment script, The risk of a brain fissure problem can be reduced by supplemental detection of third-party nodes to determine whether or not to switch.
2, multi-node master-slave +mha/mmm

Multi-node master-Slave, can adopt a master Multi -Slave, or dual-master multi-slave mode. This mode, you can use MHA or MMM to manage the entire cluster, the current MHA application of the most, priority recommended MHA, the latest MHA has also supported the Gtid mode of MySQL 5.6.

MHA Architecture :

1) Node

MHA is based on the MySQL replication environment, where both the master role and the slave role, called node, are the object nodes that are monitored and managed. The MHA node package needs to be installed on the node server.

2) Manager

Managers in the MHA architecture are recommended to be deployed on a separate server and, of course, on a slave, but the slave should never be chosen as the new master, otherwise the MHA architecture after failover loses high availability. The manager server needs to install the MHA Manager package and complete a master configuration file. One manager can manage multiple sets of MySQL replication environments.

MHA Working principle :

The purpose of the MHA is to maintain the high availability of the master library in MySQL replication, which is best characterized by the ability to repair the difference logs between multiple slave, eventually keeping all slave data consistent, and then choosing a new master from which to act. and point the other slave at it.

The basic work flow is as follows:

1) the manager regularly monitors master, the monitoring interval is determined by the parameter ping_interval, the default is 3 seconds, can use its own monitoring function, can also call third-party software to monitor; The MHA itself provides two ways of monitoring: Select (Perform Select 1) and connect (create connection/disconnect), the default is select mode due to parameter ping_type.

2) When a master failure is detected, the SSH script is called to perform a check on all node, including the following:

    • Whether the MySQL instance can be connected;
    • Whether the master server can be SSH connected;
    • Check the status of SQL thread;
    • Check which servers are dead, which servers are active, and slave instances of the activity;
    • Check the configuration of slave instance and copy filter rules;
    • Finally, exit the monitoring script and return the code that represents the special meaning.

3) Start master failover, including the following sub-stages:

    • 1:configuration Check Phase

At this stage, if the SQL thread of a Slave instance is stopped, it is automatically started, and the active servers and slaves are confirmed again.

    • 2:dead Master Shutdown Phase

At this stage, first call Master_ip_failover_script, if HA is based on VIP implementation, then close the VIP, if based on the directory database implementation, then modify the mapping record. Then call the Shutdown_script script to force the shutdown of the host to avoid a brain fissure when the service restarts.

    • 3:master Recovery Phase

Also includes the following 3 sub-stages:

3.1:getting Latest Slaves Phase

Check each slave, get the nearest and oldest binary log file and position, and check the precedence of each slave as master, depending on the Candidate_master, No_master, [server_xxx] order, Binary log difference amount and other factors.

3.2:saving Dead Master ' s Binlog Phase

If the dead master server can still be connected via SSH, the binary log of dead master is extracted, and the starting point for extracting the logs is the latest binary log file and position obtained from the previous step, until the last event log, and the dead The files created in master's local working directory (determined by parameter remote_workdir) save these extracted logs and then copy the file to the manager server's working directory (as determined by the parameter manager_workdir). Of course, if the dead Master system cannot be connected, there is no difference in binary log. In addition, MHA also to the individual slave node health check, mainly SSH connectivity.

3.3:determining New Master Phase

Next, call the Apply_diff_relay_logs command to restore the slave difference log, which refers to relay log between each slave. Once the recovery is complete, all the slave data is consistent, and you can select new master based on the priority level.

3.4:new Master Diff Log Generation Phase

Here is the diff log between the Dead master and new master, which will be copied to the new master's working directory (REMOTE_WORKDIR) in the phase 3.2 saved binary log.

3.5:master Log Apply Phase

Restores the differential log from the previous copy to the new master, and can be restored manually if an error occurs. Then get new Master's Binlog name and position so that the other slave start copying from this new binlog name and position. Finally, the new master's write permission is turned on, and the READ_ONLY parameter is set to 0.

    • 4:slaves Recovery Phase

4.1:starting Parallel Slave Diff Log Generation Phase

Generate a diff log between slave and new slave, and copy the log to the working directory of each Slave, which dead part of the log that differs between master and new master because the slave is synchronized in phase 3.3.

4.2:starting Parallel Slave Log Apply Phase

Apply this section of the diff log on each slave, then point the slave to the new master by the Change Master to command, and finally start copying (start slave).

    • 5:new Master Cleanup phase

Cleaning up new master is simply resetting slave info, which cancels the original slave information.

The entire master failover process is complete.

To put it simply, the principle of work:

Save the binary log event from the master of the outage crash (Binlog events);

Identify slave that contain the latest updates;

Apply the difference of the trunk log (relay log) to the other slave;

Apply binary log events saved from master (Binlog events);

Promote a Slave for the new master;

Enable other slave to connect to the new master for replication;

advantages of MHA:

1) Master automatic monitoring and failover
In a master-slave replication environment that currently exists, MHA can monitor master host failures and failover automatically. Even if some slave do not accept the new relay log EVENTS,MHA will automatically recognize the difference relay log events from the latest slave, and apply the difference event to the other slaves. So all the slave are consistent. MHA Second Level failover (9-12 SEC monitoring to host failure, choose 7 seconds to turn off the power host to avoid brain fissures, then apply the difference relay logs, register to the new master, usually takes time 10-30 seconds that is total downtime). In addition, in the configuration file you can configure a slave priority to become master. Because MHA fixes the consistency between slave, DBAs don't have to deal with consistency issues. When the new master is migrated, the other slave are restored in parallel. Even if there are tens of thousands of slave, it will not affect the recovery master time, Slave also quickly completed. When one of the master crashes, MHA4 seconds complete failover, which is an active/passive cluster solution that cannot be completed.
2) Interactive (manual) master failover
MHA can be used to do only failover, without monitoring MASTER,MHA only as a failover interaction.
3) non-interactive failover
Non-interactive failover is also available (do not monitor master, automatic failover). This feature is useful, especially if you have installed other software monitoring master. For example, use Pacemaker (Heartbeat) to monitor master faults and VIP takeover with MHA failover and slave elevation.
4) switch master to different host online
In many cases, it is necessary to transfer master to other hosts (such as replacing the RAID controller, upgrading the master machine hardware, and so on). This is not a master crash, but planned maintenance must be done. Planned maintenance leads to downtime and must be recovered as quickly as possible. Fast Master switching and graceful blocking writes are required, and MHA provides this approach. Graceful master switch, block write operation within 0.5-2 seconds. In many cases, a downtime of 0.5-2 seconds is acceptable, and even if the Maintenance window is not scheduled. This means that when a faster machine needs to be replaced and a higher version is upgraded, the DBA can easily take action.
5) Master crash does not cause master-slave data inconsistency
When master crash, MHA automatically recognizes the difference between relay logevents between Slave and then applies to different slave, eventually all slave are synchronized. Combined with the use of semi-synchronous, there is virtually no data loss.
6) MHA deployment does not affect current environment settings
One of the most important design concepts of MHA is to use it as easily as possible. With 5.0+ and above, other HA scenarios need to change the MySQL deployment settings, MHA will not let DBAs do these deployment configurations, both synchronous and semi-synchronous environments are available. Start/stop/upgrade/downgrade/install/uninstall MHA without changing MySQL master/slave (e.g. start/stop). When you need to upgrade MHA to a new version, you do not need to stop MySQL, just update the MHA version, and then restart the Mhamanger. MHA support includes 5.0/5/1/5.5/5.6. Some HA scenarios require a specific MySQL version (such as Mysqlcluster,mysql withglobal transaction ID, etc.), and you may not want to migrate the app just for Masterha.
7) Do not add additional servers
MHA includes MHA Manager and MHA node. MHA node runs on each MySQL server, the manager can deploy a single machine, monitoring 100+ above master, the total number of servers will not be greatly increased. It is important to note that the manager can also run on a machine in the slaves.
8) No impact on performance
When monitoring Master,mha is only a few seconds (default 3 seconds) to send a ping packet, no large queries are sent. Master-slave replication performance is not affected
9) Apply any of the storage engines that MySQL applies to from the master
MySQL is not only suitable for transaction-safe InnoDB engines, MHA can be applied in the engine from which the master applies. Even with the legacy environment of the Mysiam engine, no migration is possible with MHA.

limitations of MHA:

    • The need to get SSH trust across nodes is a challenge for some corporate security systems, because if a node is compromised by hackers, other nodes will suffer as well;
    • The script that comes with it also needs to be supplemented, of course, the general use is sufficient.

MHA Connection Switching:

Usually there are two kinds of solutions: first, when the VIP is introduced, when the database is switched, the VIP switches, and most of the commonly used HA software is implemented by VIP, such as Oracle Rac,sql server cluster, heartbeat, etc. The second is to use the global Catalog database , the connection between the application and the database is mapped to a list, when the database switch, change this map list. MHA does not restrict the way users use it, but provides an interface through Master_ip_failover_script configuration parameters. If the VIP way to achieve, can introduce third-party software, such as keplived, can also write a VIP switch script.

3, Multi-node master-slave +zookeeper

In a large-scale node environment, using keepalived or MHA as MySQL's high-availability management is still somewhat complex or cumbersome. First of all, so many nodes without the configuration services to manage, it is necessary to be disorganized, online switching is easy to operate by mistake. In larger-scale environments, zookeeper management clusters are recommended for fast detection switching and easy node management.


Portal lvs+keepalived and MHA installation configuration:

Http://bestvivi.com/2015/09/09/MySQL%E4%B8%BB%E4%B8%BB%E5%A4%8D%E5%88%B6+LVS+Keepalived%E5%AE%9E%E7%8E%B0MySQL %e9%ab%98%e5%8f%af%e7%94%a8%e6%80%a7/


Distributed data Storage-MySQL master-slave replication high-availability scenario

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.