A solution to the problem of user process dead loop under Linux system

Source: Internet
Author: User
Tags system log

In the Linux system operation, sometimes encounter a user-state process dead cycle, that is, the system unresponsive, process hanging Deng problem, then how to deal with these problems? The following small series to introduce the next user-state process dead loop problem how to deal with.

1, problem phenomenon

Business processes (user-state multithreaded) die, the operating system unresponsive, the system log there is no exception. From the process's kernel-state stack, it appears that all the lines Cheng Duka in the following stack flow of the kernel state:

[root@vmc116 ~]# Cat/proc/27007/task/11825/stack

["Ffffffff8100baf6"] retint_careful+0x14/0x32

["Ffffffffffffffff"] 0xffffffffffffffff

2, problem analysis

1) Kernel stack analysis

From the kernel stack, all processes are blocked on the retint_careful, which is the process in the interrupt return process, the Code (assembly) is as follows:

Entry_64.s

The code is as follows:

RET_FROM_INTR:

Disable_interrupts (Clbr_none)

Trace_irqs_off

Decl Per_cpu_var (Irq_count)

/* Restore saved previous Stack * *

POPQ%rsi

CFI_DEF_CFA RSI,SS+8-RBP/* Reg/off reset after def_cfa_expr * *

Leaq ARGOFFSET-RBP (%rsi),%RSP

Cfi_def_cfa_register RSP

Cfi_adjust_cfa_offset Rbp-argoffset

。。。

Retint_careful:

Cfi_restore_state

BT $TIF _need_resched,%edx

Jnc retint_signal

trace_irqs_on

Enable_interrupts (Clbr_none)

PUSHQ_CFI%rdi

Schedule_user

POPQ_CFI%rdi

Get_thread_info (%RCX)

Disable_interrupts (Clbr_none)

Trace_irqs_off

JMP Retint_check

This is actually the user state process after the interruption of the user state, from the interrupt return process, combined with retint_careful+0x14/0x32, disassembly, you can confirm that the blocking point is actually

Schedule_user

This is actually called schedule () to dispatch, which means that when the process goes to the interrupt return process, it is found to be scheduled (set to tif_need_resched), so there is a dispatch here.

There is a question: why don't you see the stack frame at the schedule () level on the stack?

Since this is a direct call to the Assembly, there is no stack frame stack and context save operation.

2) Analysis of state information

From the top command result, the associated thread is actually in the R state, the CPU is almost completely depleted, and most of the user state is consumed:

[root@vmc116 ~]# Top

Top-09:42:23 up, 2:21, users, Load average:84.08, 84.30, 83.62

tasks:1037 Total, running, 952 sleeping, 0 stopped, 0 zombie

Cpu (s): 97.6%us, 2.2%sy, 0.2%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

mem:32878852k Total, 32315464k used, 563388k free, 374152k buffers

swap:35110904k Total, 38644k used, 35072260k free, 28852536k cached

PID USER PR NI virt RES SHR S%cpu%mem time+ COMMAND

27074 Root 0 5316m 163m 14m R 10.2 0.5 321:06.17 Z_itask_templat

27084 Root 0 5316m 163m 14m R 10.2 0.5 296:23.37 Z_itask_templat

27085 Root 0 5316m 163m 14m R 10.2 0.5 337:57.26 Z_itask_templat

27095 Root 0 5316m 163m 14m R 10.2 0.5 327:31.93 Z_itask_templat

27102 Root 0 5316m 163m 14m R 10.2 0.5 306:49.44 Z_itask_templat

27113 Root 0 5316m 163m 14m R 10.2 0.5 310:47.41 Z_itask_templat

25730 Root 0 5316m 163m 14m R 10.2 0.5 283:03.37 Z_itask_templat

30069 Root 0 5316m 163m 14m R 10.2 0.5 283:49.67 Z_itask_templat

13938 Root 0 5316m 163m 14m R 10.2 0.5 261:24.46 Z_itask_templat

16326 Root 0 5316m 163m 14m R 10.2 0.5 150:24.53 Z_itask_templat

6795 Root 0 5316m 163m 14m R 10.2 0.5 100:26.77 Z_itask_templat

27063 Root 0 5316m 163m 14m R 9.9 0.5 337:18.77 Z_itask_templat

27065 Root 0 5316m 163m 14m R 9.9 0.5 314:24.17 Z_itask_templat

27068 Root 0 5316m 163m 14m R 9.9 0.5 336:32.78 Z_itask_templat

27069 Root 0 5316m 163m 14m R 9.9 0.5 338:55.08 Z_itask_templat

27072 Root 0 5316m 163m 14m R 9.9 0.5 306:46.08 Z_itask_templat

27075 Root 0 5316m 163m 14m R 9.9 0.5 316:49.51 Z_itask_templat

。。。

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.