The hangcheck-timer module of Oracle9i/10g/11gR1IO-Fencing in Linux

Source: Internet
Author: User
Hangcheck-timer module of Oracle9i10g11gR1IO-Fencing in Linux platform, Hangcheck-timer is a kernel-level IO-Fenc provided by Linux.

Hangcheck-timer module of Oracle 9i/10g/11gR1 IO-Fencing in Linux. Hangcheck-timer is a kernel-level IO-Fenc provided by Linux.

I.Official Website description

Refer to MOS:

9i, 10g, and11gR1 RAC [ID 726833.1]

Hangcheck_timermodule is required to run a supported configuration in Oracle Real ApplicationClusters environments on Linux, with Oracle releases 9i, 10g, or 11gR1RAC. this note identifies and outlines the requirements needed toconfigure hangcheck-timer in an Oracle Enterprise Linux, Red Hat Linux, or SUSELinux environment.

Note: Hangheck timer is notrequired starting with Oracle Clusterware 11gR2

Starting in release 9.2.0.2and later, Oracle RAC environments required using a new I/O fencing model, named the hangcheck-timer module. this module was implemented to replace theWatchdog module, which provided similar fencing functionality. hangcheck-timerwas subsequently delivered as part of the standard kernel distribution forLinux kernel releases 2.4 and above.

Hangcheck-timer shouldbe loaded at boot time, and monitors the Linux kernel for long operatingsystem hangs that cocould affect the reliability of a RAC node. it runs inkernel mode and uses the Time Stamp Counter (TSC) to catch scheduling delays ornode hangs. this is done by setting a timer, then checking when the timerfires as to whether it was delayed by more than the allowed margin oferror. if the duration exceeds the allowed time of (hangcheck_tick + hangcheck_margin seconds), the machine is restarted. hangcheck-timer willnot cause reboots to occur due to CPU starvation.

Hangcheck-timer requiresthree configuration parameters:

(1) hangcheck_tick-defines howoften, in seconds, the hangcheck-timer checks the node for hangs. The defaultvalue is 60 seconds.

(2) hangcheck_margin-defines howmuch margin is allowed, in seconds, between expected scheduling and realscheduling time. The default value is 180 seconds.

(3) hangcheck_reboot-determinesif the hangcheck-timer restarts the node if the kernel fails to respond withinthe sum of the hangcheck_tick and hangcheck_margin parameter values. if thevalue of hangcheck_reboot is equal to or greater than 1, then thehangcheck-timer module restarts the system. if the hangcheck_reboot parameteris set to zero, then the hangcheck-timer module will not reboot the node, even if a hang is detected. the default value varies by kernelversion. in the 2.4 kernel, the default is 1. in 2.6 kernels, thedefault is 0.

When hangcheck_reboot = 1 and meets the following formula, hangcheck-timer uses the reboot system: system hang time> (hangcheck_tick + hangcheck_margin)

All hangcheck-timer defaultvalues shoshould be explicitly overridden when loading the kernel module, based onthe Oracle release as follows:

Hangcheck_tick = 30hangcheck_margin = 180 hangcheck_reboot = 1

-- 9i: if the default setting of "oracle misscount" is 220 seconds, hangcheck_tick = 30hangcheck_margin = 180 hangcheck_reboot = 1

Hangcheck_tick = 1hangcheck_margin = 10 hangcheck_reboot = 1

-- 10g/11gR1: If "CSS misscount" is set to 30 or 60 seconds, hangcheck_tick = 1hangcheck_margin = 10 hangcheck_reboot = 1

You must always ensure thatthe Cluster misscount setting is greater than the sum of the setting forhangcheck_tick + hangcheck_margin.

When running OracleClusterware on Linux, hangcheck-timer shoshould always be configured on each RACcluster node, as the functionality of this module is required to provide I/O Fencingto ensure no stray writes will occur from an evicted node in a RACcluster. to verify if the hangcheck-timer module is running on a nodeexecute as the root or oracle user:

#/Sbin/lsmod | grep hangcheck

Hangcheck-timer 2672 0

If the hangcheck-timer moduleis loaded (running) you will see output similar to above. When hangcheck-timeris not loaded no output is generated, and the command prompt is returned to theuser.

In an Oracle Enterprise Linux, Red Hat 4/5, or SUSE 9/10 environment the hangcheck-timer module is loadedusing the modprobe command:

# Modprobe hangcheck-timer hangcheck_tick = 1 hangcheck_margin = 10hangcheck_reboot = 1

In order to ensure the moduleis loaded at boot time, you shoshould also place the same command in the appropriatelocal command execution directory (e.g. /etc/rc. d/rc. local, or/etc/init. d/boot. local ). in earlier releases, hangcheck-timer was loadedusing insmod in place of modprobe. consult your release specific documentationto determine which initialization method is required.

Hangcheck-timer will providemessage logging to the system messages log when a failure is detected, and anode restart is initiated by the module:

(1) When Hangcheck-timer reboots itmay leave "Hangcheck: hangcheck is restarting the machine" message in/var/log/messages.

(2) If you see the followingmessage in/var/log/messages: "Hangcheck: hangcheck value pastmargin! "This means a reboot was required but was not completed MED, becausehangcheck_reboot was not set to 1. if this message is seen, you mustreload the hangcheck module as described earlier in this note, with thehangcheck_reboot value set to 1.

Note:

Bugs: 6125546 which can preventhangcheck-timer from rebooting in RHEL4 (fixed in 2.6.9.56 or RHEL4.6)

,

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.