[Kernel Document Series] NMI Watchdog

Source: Internet
Author: User

[Kernel Document Series]

NMIWatchdog

 

Qin Baiyi

Qinchenggang@sict.ac.cn

 

[Both x86 and X86-64 architectures support NMI watchdog]

 

Is your system locked frequently?
Up )? After unlocking, the system no longer responds to the keyboard? Do you want to help us solve similar problems? If you answer "yes" to all questions, this document is for you.

On a lot of X86/X86-64-structured hardware, we can use a mechanism called the "watchdog NMI interrupt. (NMI: non
Maskable Interrupt. the interruption can be responded even when the system is locked ). This mechanism can be used to debug the kernel lock phenomenon. By periodically executing NMI interruptions, the kernel can monitor whether a CPU is locked. Print debugging information when a processor is locked.

To use the NMI watchdog, you must first support APIC in the kernel. For the SMP kernel, APIC-related support has been automatically compiled into the kernel. For the up kernel, you must enable config_x86_up_apic (Processor type and
Features-> Local APIC support on uniprocessors) or config_x86_up_ioapic (Processor type and features->
IO-APIC support on uniprocessors ). In a single processor system without a IO-APIC, configure config_x86_up_apic. In a single processor system with a IO-APIC, configure config_x86_up_ioapic. [Note: Some options related to kernel debugging may disable NMI watchdog. For example, Kernel stack meter or kernel tracer].

For X86-64 systems, APIC has been compiled into the kernel.

When using a local APIC (nmi_watchdog = 2), you need to use the first performance register. Therefore, this register cannot be used for other purposes (such as high-precision performance analysis ). The driver of oprofile and perfctr has automatically disabled the NMI watchdog of the local APIC.

You can enable the NMI watchdog by enabling the parameter "nmi_watchdog = N. Add the following statement to related items of Lilo. conf:

Append = "nmi_watchdog = 1"

For SMP and up machines with IO-APIC, set nmi_watchdog = 1. For up machines without a IO-APIC, set nmi_watchdog = 2, but only works on some processors. If you have any questions, after starting with nmi_watchdog = 1, check the NMI items in the/proc/interrupts file. If this item is 0, use nmi_watchdog = 2 to restart, check the NMI items again. If it is still 0, the problem is serious. Your processor may not support NMI.

Lockup refers to the following situation: if any CPU in the system cannot handle periodic local clock interruptions and lasts for more than five seconds, the NMI processing function generates an oops and kills the current process. This is a "controllable crash" (controlled crash, which refers to the ability to output kernel information when a crash occurs). You can use this mechanism to debug the "lock" phenomenon. Then, no matter when "Lock" occurs, oops will be automatically output in 5 seconds. If the kernel has no output information, it means that the crash is too serious (for example, hardware-wise) at this time, so that NMI interruption cannot be responded, or this crash makes the kernel unable to print information.

When using local APIC, note that the frequency of NMI interruption triggering depends on the current load of the system. Due to the lack of a better clock source, the NMI watchdog in the local APIC uses the "effective cycle (cycle unhalted). The translation of this word seems inaccurate. If a friend has better suggestions, please let us know .)" Event. As you may have guessed, the clock does not count when the CPU is in the halted (empty) state. This often happens when the processor is idle. If your system is locked and the HLT command is not executed, the watchdog interruption will be triggered soon, because each clock cycle will have a "validity period" event. If, unfortunately, when the processor is locked, the "hlt" command is executed, the "validity period" event will never happen, And the watchdog will naturally not be triggered. This is a defect of the local APIC watchdog. In bad luck, the clock will never be counted. The watchdog in I/O APIC does not have this defect because it is driven by an external clock. However, its NMI frequency is very high, which will significantly affect the system performance.

X86 nmi_watchdog is disabled by default, so you need to enable it when the system starts.

During system running, you can disable the NMI watchdog by writing "0" to the file "/proc/sys/kernel/nmi_watchdog. Write "1" to the file and re-enable the dog. Even so, you still need to use the parameter "nmi_watchdog =" at startup ".

Note: In kernels earlier than 2.4.2-ac18, The x86 SMP platform unconditionally enables NMI-oopser.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.