Note: This article is excerpt from Zhang Xiaoming Oracle RAC: Cluster high availability backup and recovery
Because the cluster environment needs multiple computers to work together, to achieve the ideal state, we must consider the new challenges facing the cluster environment.
1. Concurrency control
In a clustered environment, critical data is usually stored concurrently, such as on a shared disk. The identity of each member in the cluster is equivalent, and all nodes have the same access rights to the data. At this point, there must be some mechanism to control the node's access to the data .
In Oracle RAC, concurrency control between multiple instances is performed using the DLM (distribute look Management) mechanism.
2. Amnesia (Amnesia)
This problem occurs when the cluster configuration files are not centrally stored, but each node has a local copy. When the cluster is functioning properly, the user can change the configuration of the cluster at any node, and this change is automatically synchronized to the other nodes.
But consider a scenario where two nodes of the cluster, node 1 because the normal maintenance needs to be shut down, and then modify some configuration in Node 2, and then close node 2, start node 1. Since the configuration modifications made in Node 2 were not synchronized to node 1, after Node 1 was started, it was still working with the old configuration file, which caused the configuration to be lost and also based on the so-called "amnesia".
The simplest way to solve amnesia is to use a cluster configuration file for the entire cluster, regardless of which node modifies the configuration information to the same point, and the configuration information is the same for each node.
Oracle RAC uses OCR DISK files to solve this problem.
3. Brain fissure (split Brain)
In a cluster, the nodes need to understand each other's health through a mechanism (heartbeat) to ensure that the nodes work together. Let's say the heartbeat fails, but the nodes are still working. At this point, each node thinks that the other nodes are down, and that they are the "only surviving" in the whole cluster environment, and they should get the "control" of the cluster. In a clustered environment, the storage devices are shared, (all to control the exclusive, which is bound to undermine the integrity and consistency of the data) This means data disaster, such a situation is "brain crack."
The usual way to solve this problem is to use the voting algorithm (Quorum algorithm), which is based on the following principles:
Each node in the cluster needs a heartbeat mechanism to communicate each other's "health status", assuming that each node receives a "notification" representing a vote. For a cluster of three nodes, each node will have 3 votes (itself and two additional nodes) when it runs normally. Suppose the heartbeat of node 1 fails, but node 1 is still running: The entire cluster will split into two small partition. Node 1 itself is a partition, node 2 and Node 3 is a partition. At this point must be out of a partition, which partition should be out?
At this point node 2 and Node 3 are composed of partition, each node has two votes; Node 1 itself is a partition, node 1 has only one vote. Installation Voting algorithm node 2 and Node 3 are composed of small clusters to gain control, and Node 1 is kicked out, a new cluster consisting of Node 2 and Node 3 continues to provide services externally.
If the cluster has only two nodes, then the algorithm above is useless, because each node has only one vote, there is no way to compare. This requires the introduction of the 3rd device quorum.
Quorum device typically uses a shared disk, which is also called Quorum disk, and this Quorum disk also represents a vote. When a new hop fails, two nodes at the same time to fight for quorum disk This vote, the first arrival of the request was first satisfied, the node arrived after the failure to obtain this vote. At this point, the first to get quorum disk node to get two votes, and the other node only one vote, will be kicked out of the cluster.
in Oracle RAC, voting disk is used to record the state of the members between nodes , and when a brain fissure occurs, the quorum partition gets control and the other partition is kicked out.
4.IO Isolation (IO Fencing)
This problem is the extension of the brain fissure problem, when the cluster appears "brain crack", it must be able to determine which node should gain control of the cluster, which nodes to be evicted from the cluster, this is the "voting algorithm" to solve the problem, the previous part has been explained.
But it is not enough to do so, and it is important to ensure that the evicted node cannot manipulate the shared data. Because the node may still be running at this point, it is likely that the shared data will be modified without restriction. This is an IO isolation (IO Fencing) issue to resolve.
IO Fencing Real-existing hardware and software two ways. For storage devices that support SCSI reserve/release commands, you can use the SG command. Normal node using the SCSI reserve Command "lock" storage device, the fault node found that the storage device is locked, know that they are evicted from the cluster, that is, know that they have an abnormal situation, it is necessary to restart their own to restore to normal working condition, this mechanism is also called suicide (suicide), This mechanism is used by sun and Veritas.
STONITH (Shoot the other Node in the Head) is another way to do this by directly operating the power switch. When a node fails, the other node, if detected, will be issued through the serial port command, control the power switch of the fault node, through a temporary power outage, and then fooled by the way the fault node is restarted. This approach requires hardware support.
Oracle RAC uses a software approach that directly restarts the failed node. Either way, theIO fencing purpose is the same, in order to ensure that the failed node can no longer continue to access the shared data .
------Excerpt from Zhang Xiaoming "big talk Oracle RAC: Cluster high Availability backup and recovery"
Special issues in the Oracle RAC cluster environment