High-availability architectures are basically standard for Internet services, both application services and database services need to be highly available. Although the Internet service is called 7 * 24-hour uninterrupted service, but there are some times when the service is not available, such as some time the Web page is not open, Baidu can not search or can not send micro-blog, hair micro-letter and so on. In general, the measure of high availability can be used as a reference to the amount of time that a service is unavailable in one year, to achieve 3 9 availability, only 8 hours are not available within a year, and if 5 9 availability is to be achieved, only 5 minutes of service disruption can be accumulated within one year. So although each company said its service is 7*24 uninterrupted, but in fact can do 5 a handful of 9, or even simply do not, the domestic internet giant bat (Baidu, Alibaba, Tencent) are due to the failure of the stop service problem. For a system, it may contain many modules, such as front-end applications, caching, databases, search, Message Queuing, and so on, each module needs to be high availability, to ensure the high availability of the entire system. High availability can be more complex for database services, available to users, not only accessible, but also valid, so when discussing a highly available scenario for a database, the data consistency problem in the scenario is generally considered. Today this article mainly discusses the MySQL database high availability program, introduces each kind of scheme the characteristic as well as the advantage and disadvantage, this article is to each kind of plan summary, hoped that the discussion, and everybody discusses together.
1. A scenario SAN based on shared storage
Scenario Description: SAN (Storage area Network) The simple point is that you can realize the data sharing of different servers in the network, the shared storage can decouple the database server and storage. With shared storage, the server can mount the file system properly and operate, and if the server hangs, the standby server can mount the same file system, perform the required recovery operations, and then start MySQL. The schema for shared storage is as follows:
Advantages:
1. You can avoid data loss caused by other components outside of storage.
2. Simple deployment, simple switching logic, transparent to the application.
3. Ensure the strong consistency of primary and standby data.
Limitation or disadvantage:
1. Shared storage is a single point, and if the shared storage is hung, data will be lost.
2. Price is expensive.
2. disk-based replication-based scenarios DRBD
Scenario Description: DRBD (distributed replicated block Device) is a disk replication technology that can achieve a similar effect as a SAN. DBRD is a block-level synchronous replication technology implemented in the Linux kernel module mode. It replicates each block of the master server to another server block device through the NIC and records it before the main device commits the block. DRBD is similar to Sans and has a hot standby machine that uses the same data as the fault machine when it starts providing services, except that the DRBD data is replicated and stored, not shared. The DRBD's architectural drawings are as follows:
Advantages:
1. Switch to apply transparent
2. Ensure the strong consistency of primary and standby data.
Limitation or disadvantage:
1. Affect write performance, because each write disk, the essence of the need to sync to the network server.
2. General configuration of two node synchronization, scalability is relatively poor
3. Reserve Library can not provide read services, waste of resources
3. Based on master-slave copy (single point write) scheme
the two scenarios discussed earlier rely on the underlying shared storage and disk replication technologies to solve the MySQL server single point and disk single point problem. In the actual production environment, high availability is more dependent on the replication of MySQL itself, making one or more hot copies for master, and switching the service to a hot copy when Master fails. The following schemes are based on master-slave replication scheme, the scheme from simple to complex, the function is more and more powerful, implementation difficult from easy to difficult, you can choose the right plan according to the actual situation.
3.1.keepalived/heartbeat
Program Introduction:
Keepalived is a ha software, its role is to detect the server (Web server, DB server, etc.) state, the inspection principle is analog network request detection, detection methods including http_get| Ssl_get| tcp_check| smtp_check| Misc_check and so on. For DB servers, the main IP, Port (tcp_check), but this may not be enough (such as DB Server readonly), so keepalived also support custom scripts. Keepalived to verify the state of the server by listening, and if a server failure is found, remove the failed server from the system. Keepalived's highly available architecture is shown in the following diagram, installing keepalived software on the main and from the server, and configuring the same VIP,VIP layer to shield the real IP from the application server by accessing the VIP to obtain DB services. When Master fails, keepalived perceives and slave the master, continuing to provide services transparent to the application layer.
Advantages:
1. Simple installation Configuration
2. Master failure, slave fast switching provides services and is transparent to applications.
Limitation or disadvantage:
1. The IP required to be in the same network segment.
2. The detection mechanism provided is weak and requires a custom script to determine whether master can provide services, such as updating the heartbeat table.
3. Can not guarantee the consistency of the data, the original MySQL use asynchronous replication, if the master failure, slave data may not be the latest, resulting in data loss, so switch to consider the factors of slave delay, determine the switching strategy. For strong consistent demand scenarios, you can turn on (semi-sync) semi-synchronous to reduce data loss.
4.keepalived software itself ha is not guaranteed.
3.2.MHA
Scenario Description: MHA (Master high availability) is a MySQL failover program written by a Japanese MySQL Daniel in Perl, to ensure that the database is highly available, MHA minimizes data loss by saving the binary log from the primary server on the computer for a rollback. The MHA consists of two parts: MHA Manager (Management node) and MHA node (data node). MHA can be deployed separately on a separate machine to manage multiple master-slave clusters, MHA node runs on each MySQL server, the main role is to process the binary log when switching, to ensure that the switch as little data loss. MHA Manager periodically probes the master node in the cluster, and when master fails, it automatically promotes the slave of the latest data to the new master and then points all the other slave back to the new master. The entire failover process is completely transparent to the application. The MHA architecture is as follows:
MHA Failover procedure:
A. Master exception detected, a series of judgements, and finally determined Master was down;
B. Check configuration information, ROM lists the status of each node in the current schema,
C. Master to handle failures based on defined scripts, VIP drift or turn off the mysqld service;
D. All Slave comparison sites, select the most recent Slave, compare to Master and obtain binlog differences, copy to management node;
E. Select New Master from candidate node, new Mas The TER will compare with the newest Slave of the bit and get the relaylog difference;
F. Manage nodes copy binlog differences to new master, new master applies binlog differences and relaylog differences, and finally gets bit information, and accept write requests (read_only=0),
G. Other Slave are compared to the newest Slave of the site and get relaylog differences, copy to Slave; admin node copy binlog differences to each Sl Ave, compare Exec_master_log_pos and Read_master_log_pos, get the difference log;
I. Apply all variance logs to each slave, then reset slave and point back to New Master;
J. New Master Reset Slave to clear slave information.
Advantages:
1. Code open source, convenient combination of business scenarios two times development
2. Failover, you can repair the difference log between multiple slave, eventually make all slave keep the data consistent, and then select one to act as the new master and point the other slave to it.
3. You can choose the VIP scheme flexibly or the Global Catalog database scheme (change master IP map) to switch.
Disadvantages:
1. There is no guarantee of strong consistency because it is not always possible to save binary logs from fault master, such as the master disk is broken, or SSH authentication fails.
2. Support only one master multiple from the architecture, requires a replication cluster must have at least three database servers, one main two from, that is, one to serve as master, one as standby master, the other as a from the library.
3. The use of global Catalog database scheme switching, the need to apply perceptual changes, so the application is opaque, so to keep the switch on the application transparent, still rely on the VIP.
4. Not suitable for large-scale cluster deployment, configuration is more complex.
5.MHA Management node itself ha is not guaranteed.
3.3. High availability based on zookeeper
Program Introduction:
As you can see from the previous discussion, neither the keepalived solution nor the MHA solution can solve the high availability problem of the HA software itself, because HA itself is a single point. So what if HA is also introduced into multiple replicas? Then there is a new problem, how to ensure strong synchronization between 1.HA software. 2. How to ensure that there is no multiple ha at the same time switching action. These two problems are distributed system consistency problem in essence, so we can introduce a distributed consistency protocol such as Paxos,raft for HA software to ensure the usability of HA software. Zookeeper is a typical distribution/subscription model distributed data management and coordination framework, through the zookeeper of rich data node types for cross use, with watcher event notification mechanism, can easily build a series of distributed applications involved in the core functions, such as: Data release /subscriptions, load Balancing, distributed coordination/notification, cluster Management, master election, distributed locks and distributed queues. Zookeeper is a big topic, you can Google to find more information, I mainly discuss how zookeeper to solve the HA's own usability problems. The architecture diagram is as follows:
A HA client is deployed on each MySQL node in the diagram to report the heartbeat state of the local node to the zookeeper in real time, such as the main library crash, to notify Ha by modifying the node information on zookeeper (hereinafter referred to as ZK). The HA node registers the listener event on ZK and automatically makes Ha aware when the ZK node changes, and the HA node can be deployed one or more to be used primarily for disaster tolerance. The consistency of data is achieved through the zookeeper service between HA nodes, which ensures that multiple HA nodes do not switch at the same time to a master-slave node through distributed locks. Ha itself is stateless, all MySQL node state information is stored on the zookeeper server, the switch, HA will review the MySQL node, and then switch. Let's look at the switching process after introducing zookeeper:
A.ha Client detects master exception, makes a series of judgments, and finally determines that master is down;
B.ha client deletes the node information of master on ZK;
C. Because of the monitoring mechanism, HA will perceive that nodes are deleted;
D.ha the MySQL node for a retest, such as establishing a connection, updating the heartbeat table, etc.
E. After you confirm the exception, switch.
Let's take a look at this architecture to ensure that HA itself is highly available
(1). If the ha-client itself hangs, is the MySQL node normal?
Ha-client managed MySQL node can not maintain heartbeat with zookeeper, ZK service will remove the node, HA will perceive this change, ready to try a switch, before switching, will be recheck, recheck when found that the MySQL node is OK, will not switch.
(2). The MySQL node and the zookeeper network are broken, so how is the performance?
Because ha-client and node in the same host, so ha-client can not be timed to the ZK report heartbeat, ZK will be the corresponding MySQL node information deleted, ha try to retest, still fail, then switch.
(3). Ha hung up, how did it go?
Because HA is stateless and has multiple replicas, an ha hangs without affecting the entire system.
Advantages:
1. Ensure the high availability of the entire system
2. master-Slave strong consistency relies on MySQL itself, such as half sync, or peripheral tool back-up strategy, similar to MHA.
3. Scalability is very good, can manage large-scale cluster.
Disadvantages:
1. The introduction of ZK, the entire system has become complex.
4. Based on cluster (multi-point write) program
the programme discussed in section 3rd is basically the mainstream programme currently in use in the industry, which is characterized by a single point of writing. Although we can use middleware for fragmentation (sharding), but for the same data, still only allow a node to write, from this perspective, the above scheme is pseudo distributed. The two scenarios discussed below are truly distributed, and the same data can theoretically be written over multiple nodes, similar to Oracle's RAC,EMC Greenplum distributed database. In the area of MySQL, there are 2 main solutions: Galera based PXC and NDB Cluster. MySQL cluster implementation based on NDB storage engine, the use of many limitations, while the PXC is based on the InnoDB engine, although there are limitations, but because the current InnoDB use is very wide, so there is a certain reference value. So far as I know, where to go the company used the PXC program in their production environment. The architecture of PXC (Percona xtradb Cluster) is as follows:
Advantages:
1. Quasi-synchronous replication
2. Multiple simultaneous read and write nodes, can realize write extension, more than the slice scheme further
3. Automatic node Management
4. Strict Data consistency
5. High Availability of services
Disadvantages:
1. Support only the InnoDB engine
2. All tables must have a primary key
3. Writing to sync to other nodes, there is a problem of write enlargement
4. Very dependent on network stability, not for long distance synchronization
5. Solution based on middleware proxy
to be precise, middleware is not particularly related to high availability because switching is done at the database level, but when the middle tier is introduced, it makes the application more transparent. Before the introduction of middleware, all the programs, basically rely on the VIP drift mechanism, or not rely on the VIP and can not guarantee the application transparent. By joining the middleware layer, you can implement both transparent and high availability for applications. In addition, the middle layer can also do sharding, easy to write extension. Proxy schemes are many, such as MySQL's own mysql-proxy and fabric, Alibaba's Cobar and TDDL. We take fabric as an example, and the architecture is as follows:
Applications request fabric connectors and then access fabric nodes by using the XML-RPC protocol, which relies on the standby storage (backing store), which stores metadata information for the entire HA cluster. The benefit of the connector reading the information from the backing store and then caching the metadata to the cache is to reduce the overhead of interacting with the management node each time the connection is established. The Fabric node manages multiple HA group, with one Primary and multiple secondary (slave) in each ha group, and the most appropriate node is selected from secondary to the new Primary when the Primary is abnormal, and the rest Secondary will all be pointing back to the new Primary. These are automated operations that are not aware of the business and need to notify the connector of updated meta data information after the HA switch.
Advantages:
1. Switch to apply transparent
2. Scalable, easy to fragment expansion
3. Can be deployed across the engine room switch
Disadvantages:
1. Is a relatively new component, not many actual application scenarios
2. There is no solution to the strong consistency problem, the main standby strong conformance relies on MySQL itself (half synchronization), as well as rollback callback mechanism.
Summarize
The above describes some of the current MySQL typical highly available architectures, including based on shared storage scenarios, disk-based replication scenarios and based on master-slave replication scenarios. For the master-slave replication scheme, the KEEPALIVED,MHA and the introduction of zookeeper are introduced respectively. For each scenario, the transparency of the application is described in terms of continuous availability, strong consistency of data, and switching. Personally think the program based on MySQL replication is mainstream, but also very mature, the introduction of middleware and introduction of zookeeper although the system can do a better job, can support a larger scale, but also to research and development and operational dimension also put forward higher requirements. Therefore, when choosing a scenario, you should make a choice based on the business scenario and the size of the operation.