Hangcheck-timer module of Oracle9i10g11gR1IO-Fencing in Linux platform, Hangcheck-timer is a kernel-level IO-Fenc provided by Linux.
Hangcheck-timer module of Oracle 9i/10g/11gR1 IO-Fencing in Linux. Hangcheck-timer is a kernel-level IO-Fenc provided by Linux.
I.Official Website description
Refer to MOS:
9i, 10g, and11gR1 RAC [ID 726833.1]
Hangcheck_timermodule is required to run a supported configuration in Oracle Real ApplicationClusters environments on Linux, with Oracle releases 9i, 10g, or 11gR1RAC. this note identifies and outlines the requirements needed toconfigure hangcheck-timer in an Oracle Enterprise Linux, Red Hat Linux, or SUSELinux environment.
Note: Hangheck timer is notrequired starting with Oracle Clusterware 11gR2
Starting in release 9.2.0.2and later, Oracle RAC environments required using a new I/O fencing model, named the hangcheck-timer module. this module was implemented to replace theWatchdog module, which provided similar fencing functionality. hangcheck-timerwas subsequently delivered as part of the standard kernel distribution forLinux kernel releases 2.4 and above.
Hangcheck-timer shouldbe loaded at boot time, and monitors the Linux kernel for long operatingsystem hangs that cocould affect the reliability of a RAC node. it runs inkernel mode and uses the Time Stamp Counter (TSC) to catch scheduling delays ornode hangs. this is done by setting a timer, then checking when the timerfires as to whether it was delayed by more than the allowed margin oferror. if the duration exceeds the allowed time of (hangcheck_tick + hangcheck_margin seconds), the machine is restarted. hangcheck-timer willnot cause reboots to occur due to CPU starvation.
Hangcheck-timer requiresthree configuration parameters:
(1) hangcheck_tick-defines howoften, in seconds, the hangcheck-timer checks the node for hangs. The defaultvalue is 60 seconds.
(2) hangcheck_margin-defines howmuch margin is allowed, in seconds, between expected scheduling and realscheduling time. The default value is 180 seconds.
(3) hangcheck_reboot-determinesif the hangcheck-timer restarts the node if the kernel fails to respond withinthe sum of the hangcheck_tick and hangcheck_margin parameter values. if thevalue of hangcheck_reboot is equal to or greater than 1, then thehangcheck-timer module restarts the system. if the hangcheck_reboot parameteris set to zero, then the hangcheck-timer module will not reboot the node, even if a hang is detected. the default value varies by kernelversion. in the 2.4 kernel, the default is 1. in 2.6 kernels, thedefault is 0.
When hangcheck_reboot = 1 and meets the following formula, hangcheck-timer uses the reboot system: system hang time> (hangcheck_tick + hangcheck_margin)
All hangcheck-timer defaultvalues shoshould be explicitly overridden when loading the kernel module, based onthe Oracle release as follows:
Hangcheck_tick = 30hangcheck_margin = 180 hangcheck_reboot = 1
-- 9i: if the default setting of "oracle misscount" is 220 seconds, hangcheck_tick = 30hangcheck_margin = 180 hangcheck_reboot = 1
Hangcheck_tick = 1hangcheck_margin = 10 hangcheck_reboot = 1
-- 10g/11gR1: If "CSS misscount" is set to 30 or 60 seconds, hangcheck_tick = 1hangcheck_margin = 10 hangcheck_reboot = 1
You must always ensure thatthe Cluster misscount setting is greater than the sum of the setting forhangcheck_tick + hangcheck_margin.
When running OracleClusterware on Linux, hangcheck-timer shoshould always be configured on each RACcluster node, as the functionality of this module is required to provide I/O Fencingto ensure no stray writes will occur from an evicted node in a RACcluster. to verify if the hangcheck-timer module is running on a nodeexecute as the root or oracle user:
#/Sbin/lsmod | grep hangcheck
Hangcheck-timer 2672 0
If the hangcheck-timer moduleis loaded (running) you will see output similar to above. When hangcheck-timeris not loaded no output is generated, and the command prompt is returned to theuser.
In an Oracle Enterprise Linux, Red Hat 4/5, or SUSE 9/10 environment the hangcheck-timer module is loadedusing the modprobe command:
# Modprobe hangcheck-timer hangcheck_tick = 1 hangcheck_margin = 10hangcheck_reboot = 1
In order to ensure the moduleis loaded at boot time, you shoshould also place the same command in the appropriatelocal command execution directory (e.g. /etc/rc. d/rc. local, or/etc/init. d/boot. local ). in earlier releases, hangcheck-timer was loadedusing insmod in place of modprobe. consult your release specific documentationto determine which initialization method is required.
Hangcheck-timer will providemessage logging to the system messages log when a failure is detected, and anode restart is initiated by the module:
(1) When Hangcheck-timer reboots itmay leave "Hangcheck: hangcheck is restarting the machine" message in/var/log/messages.
(2) If you see the followingmessage in/var/log/messages: "Hangcheck: hangcheck value pastmargin! "This means a reboot was required but was not completed MED, becausehangcheck_reboot was not set to 1. if this message is seen, you mustreload the hangcheck module as described earlier in this note, with thehangcheck_reboot value set to 1.
Note:
Bugs: 6125546 which can preventhangcheck-timer from rebooting in RHEL4 (fixed in 2.6.9.56 or RHEL4.6)
,