Solution to process Kernel stack overflow caused by XFS

Source: Internet
Author: User

Solution to process Kernel stack overflow caused by XFS
System Environment

  • System Version: CentOS release 6.5
  • Kenel version: 2.6.32-431.20.3.el6.x86 _ 64
  • File System: XFS
Problem description

System panic and print the following calltrace information:

kvm: 16396: cpu1 unhandled wrmsr: 0x391 data 2000000fBUG: scheduling while atomic:qemu-system-x86/27122/0xffff8811BUG: unable to handle kernel paging request at 00000000dd7ed3a8IP: [<fffffff81058e5d>] task_rq_lock+0x4d/0xa8PGD 0Oops:0000 [#1] SMPlast sysfs file: /sys/devices/pci0000:00/0000:00:02.2/0000:04:00.0/host0/target0:2:1/0:2:1/block/sdb/queue/logical_block_size...[<ffffffff81058e5d>] ? task_rq_lock+0x4d/0xa0[<ffffffff8106195c>] ? try_to_wakeup+0x3c/0x3e0[<ffffffff81061d55>] ? wake_up_process+0x15/0x20[<ffffffff810a0f62>] ? __up+0x2a/0x40[<ffffffffa03394c2>] ? xfs_buf_unlock+0x32/0x90 [xfs][<ffffffffa030297f>] ? xfs_buf_item_unpin+0xcf/0x1a0 [xfs][<ffffffffa032f18c>] ? xfs_trans_committed_bulk+0x29c/0x2b0 [xfs][<ffffffff81069f15>] ? enqueue_entity+0x125/0x450[<ffffffff81060aa3>] ? perf_event_task_sched_out+0x33/0x70[<ffffffff81069973>] ? dequeue_entity+0x113/0x2e0[<ffffffffa032326d>] ? xlog_cli_committed+0x0x3d/0x100 [xfs][<ffffffffa031f79d>] ? xlog_state_do_callback+0x15d/0x2b0 [xfs][<ffffffffa031f96e>] ? xlog_state_done_syncing+0x7e/0xb0 [xfs][<ffffffffa03200e9>] ? xlog_iodone+0x59/0xb0 [xfs][<ffffffffa033ae50>] ? xfs_buf_iodone_work+0x0/0x50 [xfs][<ffffffffa033ae76>] ? xfs_buf_iodone_work+0x26/0x50 [xfs]

As follows:

Error Tracking

Unable to handle kernel paging request at least writable dd7ed3a0
Zookeeper dd7ed3a0 is the address of the user space and won't be accessed by the kernel normally. Therefore, it can be identified as a BUG in the kernel.

IP: [<ffffffff81058e5d>] task_rq_lock + 0x4d/0xa8

Because kdump is not deployed in the system, you can only use objdump for static analysis to further track the wrong command address.

    ffffffff81058e10 <task_rq_lock>:    * interrupts. Note the ordering: we can safely lookup the task_rq without    * explicitly disabling preemption.    */    static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)            __acquires(rq->lock)    {    ffffffff81058e10:       55                      push   %rbp    ffffffff81058e11:       48 89 e5                mov    %rsp,%rbp    ffffffff81058e14:       48 83 ec 20             sub    $0x20,%rsp    ffffffff81058e18:       48 89 1c 24             mov    %rbx,(%rsp)    ffffffff81058e1c:       4c 89 64 24 08          mov    %r12,0x8(%rsp)    ffffffff81058e21:       4c 89 6c 24 10          mov    %r13,0x10(%rsp)    ffffffff81058e26:       4c 89 74 24 18          mov    %r14,0x18(%rsp)    ffffffff81058e2b:       e8 10 1f fb ff          callq  ffffffff8100ad40 <mcount>    ffffffff81058e30:       48 c7 c3 40 68 01 00    mov    $0x16840,%rbx    ffffffff81058e37:       49 89 fc                mov    %rdi,%r12    ffffffff81058e3a:       49 89 f5                mov    %rsi,%r13    ffffffff81058e3d:       ff 14 25 80 8b a9 81    callq  *0xffffffff81a98b80    ffffffff81058e44:       48 89 c2                mov    %rax,%rdx            PVOP_VCALLEE1(pv_irq_ops.restore_fl, f);    }    static inline void raw_local_irq_disable(void)    {            PVOP_VCALLEE0(pv_irq_ops.irq_disable);    ffffffff81058e47:       ff 14 25 90 8b a9 81    callq  *0xffffffff81a98b90            struct rq *rq;            for (;;) {                    local_irq_save(*flags);    ffffffff81058e4e:       49 89 55 00             mov    %rdx,0x0(%r13)                    rq = task_rq(p);    ffffffff81058e52:       49 8b 44 24 08          mov    0x8(%r12),%rax    ffffffff81058e57:       49 89 de                mov    %rbx,%r14    ffffffff81058e5a:       8b 40 18                mov    0x18(%rax),%eax    ffffffff81058e5d:       4c 03 34 c5 60 cf bf    add    -0x7e4030a0(,%rax,8),%r14    ffffffff81058e64:       81                    spin_lock(&rq->lock);    ffffffff81058e65:       4c 89 f7                mov    %r14,%rdi    ffffffff81058e68:       e8 a3 23 4d 00          callq  ffffffff8152b210 <_spin_lock>

Use objdump to disassemble vmlinux and locate the wrong command. When the address ffffffff81058e5d is run, the system fails. Find the corresponding code segment and find that the error occurs when task_rq_lock () calls task_rq.

Kernel/sched. c

    #define task_rq(p)              cpu_rq(task_cpu(p))    /*    * task_rq_lock - lock the runqueue a given task resides on and disable    * interrupts. Note the ordering: we can safely lookup the task_rq without    * explicitly disabling preemption.    */    static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)            __acquires(rq->lock)    {            struct rq *rq;            for (;;) {                    local_irq_save(*flags);                    rq = task_rq(p);                    spin_lock(&rq->lock);                    if (likely(rq == task_rq(p)))                            return rq;                    spin_unlock_irqrestore(&rq->lock, *flags);            }    }

Include/linux/sched. h

    #define task_thread_info(task)  ((struct thread_info *)(task)->stack)    static inline unsigned int task_cpu(const struct task_struct *p)    {            return task_thread_info(p)->cpu;    }    union thread_union {        struct thread_info thread_info;        unsigned long stack[THREAD_SIZE/sizeof(long)];    };

Finally, we can see that the thread_info and kernel stack of the original process are in a union, and thread_info is damaged due to kernel stack overflow. Let's take a look at the kernel stack size:
Arch/x86/include/asm/page_64_types.h

    #define THREAD_ORDER    1    #define THREAD_SIZE  (PAGE_SIZE << THREAD_ORDER)    #define CURRENT_MASK (~(THREAD_SIZE - 1))

In a 64-bit system, the kernel stack size is 8 KB.

The thread_info structure and the kernel-state stack structure of the Process coexist in a union structure. The total size of the structure is 8 KB by default. The XFS process uses too much stack space for some reason, leading to stack overflow and damage the thread_info structure.

"Scheduling while atomic" should be caused by a stack overflow that overwrites the preemptible count (preempt count) in the thread_info struct of the process. As a result, the preemption count is non-zero when it is awakened next time, and panic appears.

Cause Analysis

According to the objdump analysis, there are two possibilities for stack overflow caused by XFS:

One possibility is that the xfs_iomap_write_direct () function does not use the XFS_BMAPI_STACK_SWITCH flag, causing the xfs_bmapi_allocate to be allocated to a new thread instead (the new thread can ensure a sufficient stack ), instead, it is directly allocated to the process's own kernel stack, resulting in the process's kernel stack overflow.

This bug is fixed in kernel-3.4 (commit c999a22 "xfs: introduce an allocation workqueue.

There is another controversy that using a dedicated allocation task force column will cause IO write-back to slow down due to the increase in system overhead of thread creation, in addition, the 8 K kernel stack is still helpless for processes with more than 8 K calling depth, so kernel-3.16 is introduced (6538b8e x86_64: expand kernel stack to 16 K)

Kernel discussion group commit (commit c999a22 "xfs: introduce an allocation workqueue") divides the writeback stack into a worker thread and the extended kernel stack is 16 K (6538b8e x86_64: expand kernel stack to 16 K) you can read these two solutions if you are interested.

Currently, centos 2.6.32-520. el6 has pulled the kernel-3.16 patch (6538b8e x86_64: expand kernel stack to 16 K) from mainline. The two patches do not conflict. We recommend that you first upgrade the kernel to see if the extended kernel stack is 16 K to solve the xfs_iomap_write_direct problem. If not, you can further resolve the issue (commit c999a22 "xfs: introduce an allocation workqueue.

Another possible cause is that the xfs_buf_lock () function executes a log force operation just before being blocked by a semaphore, while the log force calls are deep, causing a high stack consumption, cause system panic. It is the same as bug 1028831 in centos kernel changelog, which has been fixed in 2.6.32-495. el6.

Solution

Upgrade kenel to 2.6.32-520. el6 to ensure that the patch is included.

Changelog

[2.6.32-520. el6]

 
 
  • [kernel] x86_64: expand kernel stack to 16K (Johannes Weiner) [1045190 1060721]

[2.6.32-495. el6]

 
 
  • [fs] xfs: always do log forces via the workqueue (Eric Sandeen) [1028831]
  • [fs] xfs: Do background CIL flushes via a workqueue (Eric Sandeen) [1028831]

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.