A probe into the problem of brain fissure in high Availability Program (original)

Source: Internet
Author: User

Let's take a look at how Red Hat's documentation explains the brain crack.
# What does "Split-brain" mean?
"Split Brain" is a condition whereby, or more computers or groups of computers lose contact with one another but still Act as if the cluster were intact. This is the same as having both governments trying to rule the same country. If multiple computers is allowed to write to the same file system without knowledge of what the other nodes is doing, it Would quickly leads to data corruption and other serious problems.
Split-brain is prevented by enforcing quorum rules (which say, no group of nodes may operate unless they be in Contac T with a majority of all nodes) and fencing (which makes sure nodes outside of the quorum is prevented from interfering W ith the cluster).
In a "dual-machine hot standby" high-availability (HA) system, when the "Heartbeat Line" of the 2 nodes is disconnected, the HA system, which is a whole, coordinated action, splits into 2 separate individuals. Because of the loss of contact with each other, think it is the other side out of trouble, 2 nodes on the HA software like "crack brain people", "instinctively" scramble for "shared resources", "Application Services", there will be serious consequences: or shared resources are divided, 2-side "services" are not up, or 2-side "services" are up, But also read and write "shared storage", resulting in data corruption (common errors such as online logs polled by the database).
The heartbeat running on an alternate host can detect the running state of the primary server over an Ethernet connection and automatically take over the resources of the primary server once it cannot detect the heartbeat of the primary server. Typically, the heartbeat connection between the primary and standby servers is a separate physical connection, which can be a serial cable, an Ethernet connection implemented by a crossover line. Heartbeat can even detect the working state of the primary server through multiple physical connections, and the primary server is considered to be in a normal state as long as it can receive information about the active state of the primary server through one of the connections. From a practical point of view, it is recommended to configure multiple independent physical connections for heartbeat to avoid a single point of failure in the heartbeat communication line itself.
1. Serial Cable: It is considered to be a slightly more secure connection than an Ethernet connection, because hacker cannot run a program such as Telnet, ssh, or rsh through a serial connection, thereby reducing its chances of re-entering the backup server through the hijacked server. However, the serial cable is limited by the available length, so the primary and standby servers must be very short distances.
2, Ethernet Connection: Use this method to eliminate the length of the serial cable limitations, and this connection can be used to synchronize the file system between the primary and standby servers, thereby reducing the bandwidth from the normal communication connection.
Based on redundancy, the heartbeat control information should be transmitted using two physical connections on the primary and standby servers, thus avoiding the contention of resources when a network or cable failure causes two nodes to consider themselves to be the only active server. This contention resource scenario is called "Brain Fissure" (split-brain) or "partitioned cluster". In the case of two nodes sharing the same physical device resources, brain fissures can have quite dire consequences.
To avoid brain fissures, the following precautions can be taken:
Add a redundant heartbeat line, such as a double-line line. Minimize the chance of "splitting the brain".
Enable disk lock. The service side is locking the shared disk, "split brain" occurs, let the other side completely "Rob" shared disk resources. However, the use of the lock disk will also have a small problem, if the party occupying the shared disk does not actively "unlock", the other party will never get a shared disk. In reality, if the service node suddenly freezes or crashes, it is impossible to perform an unlock command. The backup node will not be able to take over shared resources and application services. So someone designed the "smart" lock in Ha. That is, the party being served only enables the disk lock when it discovers that the heartbeat line is completely disconnected (unaware of the peer). It's usually not locked.
Set up the quorum mechanism. For example, set a reference IP (such as gateway IP), beware of jumper completely disconnected, 2 nodes each ping the reference IP, the general rule indicates that the breakpoint is on the side, not only the "heartbeat", but also the external "service" of the network link is broken, even if the start (or continue) application services are not used, then the initiative to give Let the one end of the reference IP be able to ping the service. More insured, the one who pings the reference IP simply restarts itself to completely release the shared resources that are likely to be occupied.

Reference to: http://surpassdream.blog.51cto.com/1347340/284974
http://www1.chinaunix.com/space.php?uid=25715911&do=blog&id=261403
This article original, reproduced please indicate the source, the author
If there is any mistake, please correct me.
Email: [Email protected]

A probe into the problem of brain fissure in high Availability Program (original)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.