Use Diagwait as a diagnostic tool to obtain information used to diagnose Oracle Clusterware node eviction
-- This article excerpted the Oracle metalink website (Document ID 1525761.1)
Oracle Database-Enterprise Edition-versions 10.1.0.5 to 11.1.0.7 [Releases 10.1 to 11.1]
UnitedLinux Itanium
Linux x86
HP-UX PA-RISC (64-bits)
Ibm aix on POWER Systems (64-bit)
Oracle Solaris on iSCSI (64-bit)
HP-UX, Itanium
Red Hat Enterprise Linux Advanced Server x86-64 (AMD Opteron Architecture)
Red Hat Enterprise Linux Advanced Server Itanium
Oracle Solaris on x86-64 (64-bit)
Linux x86-64
In the following circumstances, Oracle Clusterware will evict nodes in the Cluster
The node does not ping through the network heartbeat.
Disk not pinged by Node
Unable to execute any previous task because the node is suspended or busy
In most cases, when a node is evicted, information is written into the log to analyze the cause of node eviction. However, this step may be missing in some cases. The steps described in this article are applicable to the following situations: In the Clusterware version earlier than 11gR2 (11.2.0.1, there is not enough information or there is no information available for analyzing the cause of node eviction.
From 11.2.0.1, customers do not need to set diagwait because Oracle has changed the architecture.
The reason is as follows:
When a node is evicted and becomes abnormal and busy due to CPU (or insufficient CPU), the operating system may not have time to refresh the log/trace information to the file system. A useful method is to set the diagwait attribute to delay node restart and write tracking information to the operating system. After diagwait is set, Clusterware waits for 10 seconds (Diagwait-reboottime) before restarting to securely collect diagnostic data without increasing the possibility of data corruption. After fixing the Operating System Scheduling Problem, you can follow the steps described below to cancel diagwait settings.
* -- Diagwait can be set on Windows, but it does not change behavior as it does on Unix-Linux platforms.
Diagwait attributes are available in 10.2.0.3 and include 10.2.0.4 & 11.1.0.6 and later versions. Versions 10.1.0.5 of most platforms are also included. This means that you can set diagwait in 10.1.0.5 (and later), 10.2.0.3 (and later), and 11.1.0.6 (and later. If the crsctl set/get css diagwait command returns "unrecognized parameter diagwait specified", diagterware cannot implement diagwait. Set the patchset before setting diagwait.
Solution
When you change diagwait, you must disable the clusterware software on all nodes.
The steps for setting diagwait are as follows:
Run as root user
# Crsctl stop crs
# /Bin/oprocd stop
Run the following command to disable the Clusterware software on all nodes:
# Ps-ef | egrep "crsd. bin | ocssd. bin | evmd. bin | oprocd"
This step should not return any process. If you continue to execute the next step while clusterware is running, the OCR may be damaged. Do not continue to operate until clusterware software is disabled on all nodes of the cluster.
On a node in the cluster, run the following command as the root user to change the value of the "diagwait" parameter to 13 seconds:
# Crsctl set css diagwait 13-force
Run the following command to check whether diagwait is successfully set. This command should return 13. If no diagwait is set, the message "Configuration parameter diagwait is not defined" is returned ".
# Crsctl get css diagwait
Run the following command on all nodes of the cluster to restart the Clusterware software:
# Crsctl start crs
Run the following command on all nodes to verify that Clusterware runs properly:
# Crsctl check crs
Cancel setting/delete diagwait
You should not cancel diagwait settings before correcting the Operating System Scheduling Problem, because this may cause node eviction. Diagwait will evict the node (and reconfigure) for a delay of diagwait (13) seconds, which will not affect most customers. To remove diagwait, follow these steps, where step 3 should be replaced by the following command
# Crsctl unset css diagwait-force
(Note: you must use the-force option to cancel diagwait settings because Clusterware is disabled when you cancel settings)