In the Linux system operation, sometimes encounter a user-state process dead cycle, that is, the system unresponsive, process hanging Deng problem, then how to deal with these problems? The following small series to introduce the next user-state process dead loop problem how to deal with.
1, problem phenomenon
Business processes (user-state multithreaded) die, the operating system unresponsive, the system log there is no exception. From the process's kernel-state stack, it appears that all the lines Cheng Duka in the following stack flow of the kernel state:
[root@vmc116 ~]# Cat/proc/27007/task/11825/stack
["Ffffffff8100baf6"] retint_careful+0x14/0x32
["Ffffffffffffffff"] 0xffffffffffffffff
2, problem analysis
1) Kernel stack analysis
From the kernel stack, all processes are blocked on the retint_careful, which is the process in the interrupt return process, the Code (assembly) is as follows:
Entry_64.s
The code is as follows:
RET_FROM_INTR:
Disable_interrupts (Clbr_none)
Trace_irqs_off
Decl Per_cpu_var (Irq_count)
/* Restore saved previous Stack * *
POPQ%rsi
CFI_DEF_CFA RSI,SS+8-RBP/* Reg/off reset after def_cfa_expr * *
Leaq ARGOFFSET-RBP (%rsi),%RSP
Cfi_def_cfa_register RSP
Cfi_adjust_cfa_offset Rbp-argoffset
。。。
Retint_careful:
Cfi_restore_state
BT $TIF _need_resched,%edx
Jnc retint_signal
trace_irqs_on
Enable_interrupts (Clbr_none)
PUSHQ_CFI%rdi
Schedule_user
POPQ_CFI%rdi
Get_thread_info (%RCX)
Disable_interrupts (Clbr_none)
Trace_irqs_off
JMP Retint_check
This is actually the user state process after the interruption of the user state, from the interrupt return process, combined with retint_careful+0x14/0x32, disassembly, you can confirm that the blocking point is actually
Schedule_user
This is actually called schedule () to dispatch, which means that when the process goes to the interrupt return process, it is found to be scheduled (set to tif_need_resched), so there is a dispatch here.
There is a question: why don't you see the stack frame at the schedule () level on the stack?
Since this is a direct call to the Assembly, there is no stack frame stack and context save operation.
2) Analysis of state information
From the top command result, the associated thread is actually in the R state, the CPU is almost completely depleted, and most of the user state is consumed:
[root@vmc116 ~]# Top
Top-09:42:23 up, 2:21, users, Load average:84.08, 84.30, 83.62
tasks:1037 Total, running, 952 sleeping, 0 stopped, 0 zombie
Cpu (s): 97.6%us, 2.2%sy, 0.2%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
mem:32878852k Total, 32315464k used, 563388k free, 374152k buffers
swap:35110904k Total, 38644k used, 35072260k free, 28852536k cached
PID USER PR NI virt RES SHR S%cpu%mem time+ COMMAND
27074 Root 0 5316m 163m 14m R 10.2 0.5 321:06.17 Z_itask_templat
27084 Root 0 5316m 163m 14m R 10.2 0.5 296:23.37 Z_itask_templat
27085 Root 0 5316m 163m 14m R 10.2 0.5 337:57.26 Z_itask_templat
27095 Root 0 5316m 163m 14m R 10.2 0.5 327:31.93 Z_itask_templat
27102 Root 0 5316m 163m 14m R 10.2 0.5 306:49.44 Z_itask_templat
27113 Root 0 5316m 163m 14m R 10.2 0.5 310:47.41 Z_itask_templat
25730 Root 0 5316m 163m 14m R 10.2 0.5 283:03.37 Z_itask_templat
30069 Root 0 5316m 163m 14m R 10.2 0.5 283:49.67 Z_itask_templat
13938 Root 0 5316m 163m 14m R 10.2 0.5 261:24.46 Z_itask_templat
16326 Root 0 5316m 163m 14m R 10.2 0.5 150:24.53 Z_itask_templat
6795 Root 0 5316m 163m 14m R 10.2 0.5 100:26.77 Z_itask_templat
27063 Root 0 5316m 163m 14m R 9.9 0.5 337:18.77 Z_itask_templat
27065 Root 0 5316m 163m 14m R 9.9 0.5 314:24.17 Z_itask_templat
27068 Root 0 5316m 163m 14m R 9.9 0.5 336:32.78 Z_itask_templat
27069 Root 0 5316m 163m 14m R 9.9 0.5 338:55.08 Z_itask_templat
27072 Root 0 5316m 163m 14m R 9.9 0.5 306:46.08 Z_itask_templat
27075 Root 0 5316m 163m 14m R 9.9 0.5 316:49.51 Z_itask_templat
。。。