1. Concurrency Control
In a cluster environment, key data is usually shared and stored, for example, on a shared disk. Each node has the same access permission to the data. In this case, there must be a mechanism to control the node's access to the data. Oracle RAC uses the DLM (distribute Lock Management) mechanism to control concurrency among multiple instances.
2. Amnesia (amnesia)
The Cluster Environment configuration file is not stored in a centralized manner, but each node has a local copy. When the cluster runs normally, you can change the cluster configuration on any node, in addition, this change will be automatically synchronized to other nodes.
Consider this scenario: for clusters with two nodes, node A needs to be shut down due to normal maintenance, and some configurations have been modified on Node B, then Node B is closed, and node A is started. Because the changes made on Node B are not synchronized to node A, the old configuration file is still used after node A is started. This will cause configuration loss, which is also called amnesia.
3. Split brain)
In a cluster, nodes learn the health status of each other through a certain mechanism (Heartbeat) to ensure that each node coordinates the work. Assume that only the "Heartbeat" node is faulty and each node is still running normally. At this time, each node considers other nodes to be down, you are the "only creator" in the entire cluster environment, and you should gain "control" of the entire cluster ". In the cluster environment, storage devices are shared, which means data disaster. This situation is split-brain"
The usual solution to this problem is to use voting.Algorithm(Quorum algorithm). Its algorithm mechanism is as follows:
Each node in the cluster needs a heartbeat mechanism to notify each other of the "Health Status". If each node receives a "notification", it means one vote. For clusters with three nodes, each node has three tickets during normal operation. When the heartbeat of node A fails but node A is still running, the entire cluster is split into two small partitions. Node A is one, and the remaining two are one. This requires removing a partition to ensure the healthy operation of the cluster.
For a cluster with three nodes, after a heartbeat problem occurs, B and C are a partition with two votes and a has only one vote. According to the voting algorithm, clusters composed of B and C obtain control, and a is removed.
If there are only two nodes, the voting algorithm becomes invalid. Because each node has only one vote. In this case, you need to introduce the third device: Quorum device. Quorum device usually uses a shared disk, which is also called quorum disk. This quorum disk also represents one ticket. When the heartbeat of the two nodes fails, the two nodes fight for the quorum disk ticket at the same time, and the first request to arrive will be satisfied first. Therefore, the first node to obtain quorum disk will receive two tickets. The other node will be removed.
4. Io isolation (fencing)
When the cluster system encounters a "Split-brain" problem, we can use the "voting algorithm" to solve the problem of who obtains control of the cluster. However, this is not enough. We must ensure that the evicted node cannot operate on shared data. This is the problem to be solved by Io fencing.
I/O fencing can be implemented in two ways: hardware and software.:
Software: For storage devices that support the SCSI reserve/release command, use the SG command. A normal node uses the SCSI Reserve Command to "Lock" the storage device. When the faulty node finds that the storage device is locked, it knows that it has been driven out of the cluster, that is, it has encountered an exception, restart the instance to restore it to normal. This mechanism is also called sicide (suicide). Sun and VERITAS use this mechanism.
Hardware: shoot the other node in the head. This method directly operates the power switch. If one node fails, if the other node can detect, A command is issued through the serial port to control the power switch of the faulty node. the faulty node is restarted by means of temporary power failure and power-on. This method requires hardware support.
--- From Oracle RAC