3.10.0-327 's kernel, Crash records as follows:
Kernel:vmlinux
Dumpfile:vmcore [PARTIAL DUMP]
cpus:48
date:wed OCT 18 20:37:18 2017
Uptime:1 days, 09:43:06
LOAD average:13.42, 10.66, 9.48
tasks:7329
Nodename:host-10-229-143-10
Release:3.10.0-327.22.2.el7.x86_64
VERSION: #1 SMP Fri Sep 15:13:08 CST 2017
machine:x86_64 (2199 Mhz)
memory:383.6 GB
PANIC: "Kernel panic-not Syncing:watchdog detected hard lockup on CPU 10"
pid:24023
COMMAND: "Fas_readwriter"
task:ffff882f460a2e00 [thread_info:ffff882f10c44000]
Cpu:10
State:task_running (PANIC)-----------------------------------------R-State deadlock, the process is in task_running state for a long time to preempt the CPU without switching, generally, After the process has been preempted, the task has been executed, or the process is in a dead loop or sleep after a preemption, which often causes multiple CPUs to interlock and the whole system to be abnormal.
Crash> BT
pid:24023 task:ffff882f460a2e00 cpu:10 COMMAND: "Fas_readwriter"
#0 [Ffff882fbfd459c8] machine_kexec at ffffffff81051c5b
#1 [Ffff882fbfd45a28] crash_kexec at FFFFFFFF810F3EC2
#2 [Ffff882fbfd45af8] Panic at FFFFFFFF816326D1
#3 [ffff882fbfd45b78] watchdog_overflow_callback at ffffffff8111d0e2
#4 [ffff882fbfd45b88] __perf_event_overflow at FFFFFFFF811608D1
#5 [FFFF882FBFD45C00] Perf_event_overflow at FFFFFFFF811613A4
#6 [FFFF882FBFD45C10] Intel_pmu_handle_irq at ffffffff81032628
#7 [Ffff882fbfd45e60] Perf_event_nmi_handler at FFFFFFFF81642BCB
#8 [Ffff882fbfd45e80] Nmi_handle at ffffffff81642319
#9 [Ffff882fbfd45ec8] Do_nmi at ffffffff81642430
#10 [FFFF882FBFD45EF0] End_repeat_nmi at ffffffff81641753
[Exception rip:put_compound_page+336]
Rip:ffffffff81178b60 Rsp:ffff882f10c47d80 rflags:00000006
rax:006016c60138402c rbx:ffffea0123302a40 rcx:0000000000000022
rdx:0000000000000246 rsi:000000000a6a9000 rdi:ffffea0123300000
Rbp:ffff882f10c47d98 R8:ffff882f10c47dc8 r9:ffff882f10c47d74
r10:ffff880000000298 r11:000000000a6aa000 r12:ffffea0123300000
r13:0000000000000246 r14:0000000000000000 R15:ffffea0123302a40
ORIG_RAX:FFFFFFFFFFFFFFFF cs:0010 ss:0018
---<NMI exception stack>---
#11 [Ffff882f10c47d80] put_compound_page at ffffffff81178b60
#12 [ffff882f10c47da0] put_page at Ffffffff81178bac
#13 [Ffff882f10c47db0] Get_futex_key at Ffffffff810e3c86
#14 [Ffff882f10c47e08] Futex_wake at FFFFFFFF810E3F1A
#15 [Ffff882f10c47e70] Do_futex at Ffffffff810e6a12
#16 [ffff882f10c47f08] Sys_futex at Ffffffff810e6f20
#17 [ffff882f10c47f80] System_call_fastpath at ffffffff81649909
First, the general Hardlock is triggered because the shut-off time is too long, so you need to find out if there is such a handle in the corresponding stack, and the common functions such as spinlock,irq_disable and so on.
According to the stack, Get_futex_key has a code like this:
#ifdef Config_transparent_hugepage
Page_head = page;
if (Unlikely (Pagetail (page))) {
Put_page (page);
/* Serialize against __split_huge_page_splitting () */
Local_irq_disable ();-------------------------------------------------------------------------off interrupt
if (Likely (__get_user_pages_fast (address, 1,!ro, &page) = = 1) {------------------called __get_user_pages_fast
Page_head = compound_head (page);
/*
* Page_head is valid pointer but we must pin
* It before taking the Pg_lock and/or
* Pg_compound_lock. The moment we re-enable
* IRQs __split_huge_page_splitting () can
* Return and the head page can be freed from
* under us. We can ' t take the pg_lock and/or
* Pg_compound_lock on a page this could be
* Freed from under us.
*/
if (page! = page_head) {
Get_page (Page_head);
Put_page (page);
}
Local_irq_enable ();
} else {
Local_irq_enable ();
Goto again;
}
}
#else
Page_head = compound_head (page);
if (page! = page_head) {
Get_page (Page_head);
Put_page (page);
}
#endif
Determine if the next config_transparent_hugepage is configured:
grep config_transparent_hugepage/boot/config-3.10.0-327.22.2.el7.x86_64
Config_transparent_hugepage=y
Instructions have been configured, Disassembly Get_futex_key confirmation, through a simple search __get_user_pages_fast whether the compilation confirms that the transparent giant page is actually turned on.
Next, you need to analyze why the Put_page call put_compound_page time does not return.
void put_page (struct page *page)
{
if (Unlikely (Pagecompound (page)))
Put_compound_page (page);
else if (Put_page_testzero (page))
__put_single_page (page);
}
The/proc/sys/vm/nr_hugepages of the system configuration is 0.
Linux crash Hardlock Troubleshooting records