Split-brain and keepalived split-brain in keepalived
In a high-availability (HA) system, when the "Heartbeat line" of the two nodes is disconnected, the HA system, originally integrated and coordinated, is split into two independent individuals. Because they lost contact with each other, they thought the other party had a fault. The HA software on the two nodes compete for "sharing resources" and "Application Services" like "Split-brain ", there will be serious consequences-or shared resources are divided, and two sides of the "service" cannot start; or two sides of the "service" are up, but at the same time read and write "shared storage ", data corruption occurs. (common errors include online log errors during database polling ).
There are probably the following consensus measures to deal with the "split brain" of the HA system:
1) Add redundant heartbeat lines, such as dual-line lines (Heartbeat lines are also HA) to minimize the probability of split brain;
2) Enable the disk lock. The Service side is locking the shared disk. When the split brain occurs, the other party can completely share the disk resources. However, locking a disk also poses a major problem. If one party who uses a shared disk does not "unlock" the disk, the other party will never get the shared disk. In reality, if a service node suddenly crashes or crashes, the UNLOCK command cannot be executed. The backup node cannot take over shared resources and application services. So someone designed the "smart" Lock In HA. That is, the disk lock is enabled only when the service side finds that the heartbeat line is completely disconnected (the opposite side is not noticed. It is usually not locked.
3) set up the arbitration mechanism. For example, if the reference IP address (such as the gateway IP address) is set, when the jumper is completely disconnected, ping the reference IP address for both nodes. Otherwise, the breakpoint is displayed at the local end. Not only does the local network link of "Heartbeat" and "External Service" have been broken, but even if the application service is no longer used, the competition will be abandoned, ping the IP address to start the service. More secure. If you cannot ping the IP address, the IP address Provider simply restarts to completely release the shared resources that may also be occupied.