How to prevent the brain fissure _php of HA cluster tutorial

Source: Internet
Author: User

How to prevent brain fissure in ha cluster


1. Introduction

Brain fissure (split-brain), refers to a highly available (HA) system, when the contact of the two nodes disconnected, originally a whole system, divided into two independent nodes, then two nodes began to scramble for shared resources, resulting in system confusion, data corruption.

For a stateless service ha, there is no brain fissure, but for stateful services (such as MySQL) ha, must be strictly to prevent brain crack. (But some production environment of the system according to the non-state service ha of the set to configure stateful services, the results are conceivable ...) )

2. How to prevent brain fissure in ha cluster

2 methods are generally used
1. Arbitration
When two nodes diverge, the arbitrator of the 3rd party decides who to listen to. This arbiter may be a lock service, a shared disk, or something else.
2. Fencing
When it is not possible to determine the state of a node, kill the other by fencing and ensure that the shared resources are fully released, provided that a reliable fence device is required.

Ideally, none of the above can be less.
However, if the node does not use shared resources, such as the database ha based on master-slave replication, we can safely omit the fence device and only hold the quorum. And many times there are no fence devices available in our environment, such as in a cloud host.

So can you dispense with arbitration, leaving only fence equipment?
No. Because, when two nodes lose contact with each other, they will fencing each other at the same time. If the fencing way is reboot, then the two machines will restart continuously. If the fencing way is power off, then the ending is likely to be 2 nodes to perish, or possibly survived one. But if two nodes lose contact with each other because of one of the nodes of the network card failure, and survived is exactly the fault of the node, then the end of the same tragedy.

Therefore, the simple two-node, in any case can not prevent the brain fissure.

3. No fence equipment is safe

Take the data copy of PostgreSQL or MySQL as an example to illustrate this problem.
In a replication-based scenario, the master-slave node does not share resources, so 2 nodes are alive without problems. The problem is that the client will not be able to access the node that was supposed to die. This also involves the problem of client routing.

There are several ways of client routing, based on the VIP, proxy based, DNS or simply the client maintains a server-side address list itself to judge Master and slave. Regardless of the way, the master-slave switch to update the route.

DNS-based routing is less reliable because DNS can be cached by clients and is difficult to clean up.

VIP-based routing has some variables, if the node that should be dead does not take off its own VIP, then it can be disruptive at any time (even if the new master has updated the ARP cache on all hosts through arping, if a host's ARP expires and an ARP query is issued, an IP conflict occurs). So it can be considered that VIP is also a special kind of shared resources, it must be removed from the fault node. As for how to pick, the simplest way is to find the fault node itself after the loss of the joint pick, if it is still alive (if it died, it will not have to pick). What if the process that is responsible for picking VIPs does not work? This time you can use a less reliable soft fence device (such as SSH).

Proxy-based routing is more reliable, because Proxy is the only service portal, as long as the proxy a place to update, there will be no client error access problems, but also to consider the high availability of proxies.

As for the method based on the server address list, the client needs to judge the master and slave through the background service (such as whether the Postgresql/mysql session is in read-only mode). At this point, if there are 2 main, the client will be out of the confusion. In order to prevent this problem, the original node found itself lost after the service to stop themselves, which is the same as the previous pick VIP truth is the same.

Therefore, in order to avoid the failure of the node, the fault node should release resources after the failure, in order to deal with the release of resources of the process itself failure, you can add soft fence. In this premise, it can be considered that there is no reliable physical fence equipment is also safe.

4. The data can not be lost after the master-slave switch

After the master-slave switch data will not be lost and brain fissure can be considered to be 2 different problems. This is also illustrated by the example of PostgreSQL or MySQL data replication.

For PostgreSQL, if configured as synchronous stream replication, you can do this regardless of whether the routing is correct or not. Because the client that is routed to the wrong node does not write any data at all, it waits for feedback from the node, and it thinks from the node that it is now the master, and of course it does not. Of course, this is not always good, but it provides sufficient time for the cluster monitoring software to correct routing errors.

For MySQL, even if it is configured for semi-synchronous replication, it may automatically downgrade to asynchronous replication after the timeout occurs. To prevent MySQL from being degraded, you can set an oversize rpl_semi_sync_master_timeout while keeping rpl_semi_sync_master_wait_no_slave on (that is, the default value). However, if you are down, the Lord will hang. The solution to this problem is the same as PostgreSQL, or configured to 1 Master 2 from, as long as not 2 from all the downtime is fine, or by the external cluster monitoring software dynamically switch semi-synchronous and asynchronous.
If it is a configured asynchronous replication, it means that you are ready to lose data. At this point, the master-slave switch when the loss of data is not a big deal, but to control the number of automatic switching. For example, control has been failover off the original owner does not allow automatic online, or if the network jitter caused by the failover, then the master-slave will continue to cut back and forth, constantly lost data, destroy the consistency of data.

5. How to implement the above strategy

You can implement a set of scripts that conform to the above logic yourself completely from the beginning. But I prefer to build on mature cluster software, such as pacemaker+corosync+ suitable resource agent. keepalived I am highly deprecated, it is not suitable for stateful service ha, even if you put arbitration, fence those things added to the plan, always feel awkward.

There are also some considerations for using the Pacemaker+corosync scheme
1) understand the function and principle of resource agent
Understand the capabilities and principles of the resource agent to know which scenarios it applies to. For example, Pgsql's resource agent is more complete, supports synchronous and asynchronous stream replication, and can be automatically switched before both, and can guarantee that the data is not lost under synchronous replication. But the current MySQL resource agent is very weak, no use gtid and no log compensation, it is easy to lose data, or do not use the good, continue to use MHA bar (however, deployment MHA faction necessary to prevent brain fissure).

2) Ensure legal votes (quorum)
Quorum can be considered as the Pacemkaer self-contained arbitration mechanism, the majority of all nodes in the cluster to select a coordinator, the cluster of all the instructions are issued by this coordinator, can be perfect to eliminate the problem of brain crack. In order for this mechanism to work effectively, there are at least 3 nodes in the cluster, and the No-quorum-policy is set to stop, which is also the default value. (Many tutorials for the convenience of demonstration, all the no-quorum-policy set to ignore, production environment if so, and no other arbitration mechanism, is very dangerous!) )

But what if there are only 2 nodes?
One is to pull a machine to borrow to fill 3 nodes, and then set the location limit, not to allocate resources to that node.
The second is to put multiple quorum small clusters together to form a large cluster, the same location to restrict the allocation of control resources.

But if you have a lot of two-node clusters, you can't find so many nodes for dine, and you don't want to pull these two-node clusters together into a large cluster (for example, it's inconvenient to manage). Then consider a third method.
The third approach is to configure a preemption resource, as well as the colocation constraints of the service and this preemption resource, who rob the resource who provides services. This preemption resource can be a lock service, such as packing one based on zookeeper, or simply starting from scratch, as in the following example.
http://my.oschina.net/hanhanztj/blog/515065
(This example is a short connection based on the HTTP protocol, more meticulous practice is to use long-connected heartbeat detection, so that the server can check out the connection disconnect and release the lock)
However, it is important to ensure that this preemption resource is highly available, can make the service to provide preemption resources lingyig high availability, can also be simple, deploy 3 services, two nodes deployed one, the third is deployed on another dedicated quorum node, at least 3 locks in the acquisition of 2 is considered to have acquired the lock. This quorum node can provide quorum services for many clusters (because a machine can only deploy one pacemaker instance, or it could do the same thing with a quorum node that deploys n pacemaker instances.) )。 However, if not forced, try to use the previous method, that is, to meet the pacemaker legal votes, this method is more simple and reliable.

6. Reference

Http://blog.chinaunix.net/uid-20726500-id-4461367.html
http://my.oschina.net/hanhanztj/blog/515065
Http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html
Http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
http://mysqllover.com/?p=799
http://gmt-24.net/archives/1077

http://www.bkjia.com/PHPjc/1073479.html www.bkjia.com true http://www.bkjia.com/PHPjc/1073479.html techarticle How to prevent brain fissures in HA clusters 1. Introduction of cerebral bifida (split-brain), refers to a system that was originally a whole when disconnected in a highly available (HA) system when the two nodes were contacted.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.