Functions of oprocd and hangcheck-timer in Linux

Source: Internet
Author: User
Functions of oprocd and hangcheck-timer in Linux

Default category: 13:38:36 read 106 comments 0 font size: LargeMediumSmall subscription

I. hangcheck-timer

From oracle9.2.0.2.0 to the latest 11.1, Oracle, we recommend that you use an I/O fencing module called hangcheck-timer when creating RAC in Linux, this module is used to monitor whether the Linux kernel of the node is hang. If the Hang stays for a long time, Oracle determines that it has an impact on the stability of the RAC node and will restart the node. this module has three parameters: hangcheck_tick, hangcheck_margin, and hangcheck_reboot. If the kernel does not respond within the total time of hangcheck-tick and hangcheck-margin, hangcheck-timer determines whether to restart the system based on the value of hangcheck_reboot. hangcheck_reboot is greater than or equal to 1, restart; 0, do not restart. In kernel 2.6, the default value is 0. Then "hangcheck: hangcheck value past margin! "Alarm information, indicating that the hangcheck-Reboot value is 1. The system should be restarted but not restarted.

Ii. oprocd

On the Linux platform, Oracle clusterware 10.2.0.4 and later versions introduce a new Oracle clusterware process monitor daemon (oprocd) process to monitor the system status and the health status of each node in the cluster, as provided in UNIX systems that do not use third-party cluster software, let's take a look at what oprocd is.

Oprocd runs together with hangcheck-timer in 10.2.0.4 on Linux. It is not associated with the hangcheck-timer module and is generated by the init. CCSD process and run with the root user. The oprocd process is locked in the memory to monitor each node in the cluster that runs on its own to detect the hardware or drive freezes on the machine, i/O fencing (which is different from the interrupt fencing function provided by SCSI ). If a machine is frozen for a long enough time, it will be evicted from the node by the cluster, it needs to force restart itself to prevent the cluster from reorganizing the lock resources on the failed nodes, failed nodes still access questionable I/O operations on shared data files. To provide this function, oprocd performs a check and then stops running (sleep). If it cannot be awakened within the expected time, oprocd restarts the local node.

Note: oprocd does not exist in a third-party cluster environment, because a third-party cluster solution that fails to pass verification on the Linux platform, therefore, oprocd will always exist in version 0.2.0.4 of Linux.

When oprocd is started, there are two parameters:
-T: timeout time. The default value is 1000, in milliseconds (oprocd_default_timeout = 1000)
-M: acceptable latency before restart, in milliseconds. Default Value: 500 (oprocd_default_margin = 500)

It is recommended to set diagwait to 13 to increase the acceptable time before restart to write more log information to the disk.
By default, the-m interval is 500:
$ PS-EFL | grep oprocd
0 s root 6444 3080 0 78 0-636-apr15? 00:00:00/bin/sh/etc/init. d/init.css D oprocd
4 S root 7255 6444 0-40--516-apr15? 00:00:00/u01/APP/crs11g/bin/oprocd run-T 1000-M 500-F

If diagwait is set to 13, the-m Time is increased by default. The following shows that after diagwait is set to 13, the-M parameter value is 10000.
$ PS-EFL | grep oprocd
0 s root 6444 3080 0 78 0-636-apr15? 00:00:00/bin/sh/etc/init. d/init.css D oprocd
4 S root 7255 6444 0-40--516-apr15? 00:00:00/u01/APP/crs11g/bin/oprocd run-T 1000-M 10000-F

3. Relationship between the two

Oprocd and hangcheck-timer run simultaneously on the Linux platform and provide different detection mechanisms. When they cause node restart, the information recorded in the system log is different:
The record "sysrq: resetting" will be recorded during restart caused by oprocd"
"Hangcheck: hangcheck is restarting the machine" is recorded during restart caused by hangcheck-timer"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.