In Oracle RAC, the health of the RAC can be detected from multiple levels, with several different mechanisms, i.e. the heartbeat mechanism and a certain voting algorithm can be used to isolate the fault. If a node fails to detect, the faulty node will be evicted from the cluster to prevent the failed node from destroying the data. This paper mainly describes several heartbeat mechanisms under Oracle RAC and the adjustment of heartbeat parameters.
First, OCSSD and CSS
OCSSD is a Linux or UNIX process that manages and provides cluster Synchronization services (CSS). Using an Oracle user to run the process and provide the Node member management feature, once the process fails, causes the node to restart. The CSS service provides 2 heartbeat mechanisms, one for the network heartbeat and one for the disk heartbeat. Both heartbeats have a maximum delay, the latency of the network heartbeat is called MC (Misscount), and the disk heartbeat delay is called IoT (I/O Timeout). These 2 parameters are in seconds, Misscount < disktimeout by default. These 2 heartbeat mechanisms are described separately below.
second, the network heartbeat
Therefore, the name of meaning is to detect the state of a node through a private network. If the private network hardware, software causes the private network between the cluster nodes can not communicate properly during a certain period of time, resulting in brain fissure. Because storage in a clustered environment is shared storage, you must isolate the failed node from the cluster at this point to avoid a data disaster. The specific action on this network heartbeat is described below:
Every one second, a sending thread in the CSSD sends a network TCP heartbeat to itself and all nodes. The receiving thread of the Ocssd.bin receives the heartbeat.
The IF the package network is dropped or have error, the error correction mechanism on the TCP would retransmit the package.
Oracle does not retransmit. From the Ocssd.log, you'll see a WARNING message about missing of heartbeat if a node does not receive a heartbeat from Another node for seconds (50% of Miscount). Another warning is reported in Ocssd.log if the same node are missing for seconds (75% of Miscount): Another warning continues from the same node for seconds (90% miscount). When the heartbeat is missing 100%. Seconds Miscount, the node is evicted
The latency of this network heartbeat is called Misscount and can be queried and modified by Crsctl tools.
[[email protected] ~]$ crsctl get CSS Misscount
Crs-4678:successful get Misscount for Cluster synchronization Services.
The above query results show that if the Inter-node network delay between nodes is greater than 30s,oracle, the fault node should be evicted from the cluster if there is a brain fissure.
How to find fault nodes, Oracle is determined by voting algorithms, here is an example of an algorithm description, describing a reference to Oracle RAC.
Each node in the cluster needs a heartbeat mechanism to communicate the "health state" of each other, assuming that each node receives a "notification" representing a vote. For a cluster of three nodes, each node will have 3 votes when it runs normally. When Node A has a heartbeat failure but nodes A is still running, the entire cluster splits into 2 small partition. Node A is one, and the remaining 2 is one. It is necessary to eliminate a partition to ensure the healthy operation of the cluster. For these 3 nodes of the cluster, a heartbeat problem occurs, B and C is a partion, there are 2 votes, a only 1 votes. According to the voting algorithm, the clusters of B and C gain control and A is excluded. If there are only 2 nodes, the voting algorithm fails. Because there are only 1 votes on each node. You need to introduce a third device: Quorum device. Quorum Device typically uses a shared disk, which is also known as Quorum disk. This quorum Disk also represents a vote. When the heartbeat of the 2 nodes is in trouble, 2 nodes go for quorum Disk at the same time, and the first arrival request is satisfied first. So the first to get quorum disk node to get 2 votes. The other node will be removed.
Once a node is isolated, it is usually a reboot of the failed node before 11GR2. In 11gr2, Clusterware will first attempt to shut down all resources on that node, attempting to clean up failed builds in the cluster, that is, restarting the failed component. If the failed cleanup component is unsuccessful, the node is restarted for forced cleanup.
third, the disk heartbeat
A thread in Ocssd.bin updates the voting disk every second.
If a node does not update the voting disks for seconds, it ' s evicted.
However, the Ocssd.bin on the local node had the logic that it would bring down the node if it had an I/O error more than M Ajority of the voting disks. Also there is a CRS reconfiguration are happening when Misscount are second and the local node is rebooted. As a result, you rarely see a eviction due to failure of the voting disk on 10.2.0.4 (this was more common in 10.2.0.1)) b Ecause the Ocssd.bin would abort the node before it get evicted by another node if writing to the voting disk is the proble M.
As described above, each node updates the voting disk every second. The shared voting disk is used to check the disk heartbeat. If the OCSSD process updates the voting disk for more than 200s, which is the value set by Disktimeout, Oracle considers the voting disk offline and generates an offline record of the voting disk in the Clusterware alarm log. If the current node voting disk is less than the number of online voting disk, the node can survive, if the number of offline voting disk is greater than or equal to the number of online voting disk, then clusterware that the disk heartbeat problems, the failure node will be evicted from the cluster, perform an automatic repair process. For example, there are 3 voting disks, Node A has a voting disk offline, at this time offline disk (1) < online disk (2), Clusterware will generate an offline record in the alarm log, but do not take any action. If the current node has 2 or more voting disks offline, at this time offline disk (2) > Online disk (1), then node A is kicked out of the cluster.
Four, reboottime parameters
It is also important to note this reboottime parameter, which is 3s by default.
Default 3 seconds-the amount of time allowed for a node to complete a reboot
After the CSS daemon have been evicted.
Crsctl Get CSS Reboottime
#Author: Leshami
#Blog: Http://blog.csdn.net/leshami
Five, the heartbeat parameter adjustment
1) 10.2.0.2 to 11.1.0.7 revision method
A) Shut down CRS on all but one node. For exact steps use note 309542.1
b) Execute Crsctl as root to modify the Misscount:
$CRS _home/bin/crsctl Set CSS Misscount <n> # # # where <n> is the maximum private network latency in seconds
$CRS _home/bin/crsctl Set CSS reboottime <r> [-force] # # # # # # # # # # # # (<r> is seconds)
$CRS _home/bin/crsctl Set CSS disktimeout <d> [-force] # # # # # # # # # # # # (<d> is seconds)
c) Reboot the node where adjustment was made
D) Start all other nodes which is shutdown in step 1
e) Execute Crsctl as root to confirm the change:
$CRS _home/bin/crsctl Get CSS Misscount
$CRS _home/bin/crsctl Get CSS Reboottime
$CRS _home/bin/crsctl Get CSS Disktimeout
2) How to modify 11GR2
With 11gR2, these settings can is changed online without taking any node down:
A) Execute Crsctl as root to modify the Misscount:
$CRS _home/bin/crsctl Set CSS Misscount <n> # # # where <n> is the maximum private network latency in seconds
$CRS _home/bin/crsctl Set CSS reboottime <r> [-force] # # # # # # # # # # # # (<r> is seconds)
$CRS _home/bin/crsctl Set CSS disktimeout <d> [-force] # # # # # # # # # # # # (<d> is seconds)
b) Execute Crsctl as root to confirm the change:
$CRS _home/bin/crsctl Get CSS Misscount
$CRS _home/bin/crsctl Get CSS Reboottime
$CRS _home/bin/crsctl Get CSS Disktimeout
Oracle cluster heartbeat and its parameter misscount/disktimeout/reboottime