A stack overflow BUG

Source: Internet
Author: User
Tags cdata

My blog: http://blog.striveforfreedom.net

Table of Contents
  • 1 BUG description
  • 2. Solution Process
  • 3 Summary
1 BUG description

Recently I modified a C program and added several new fields to a struct. After compilation, a segment error (segmentation fault) crashed. Run gdb to view the cause of a crash: mov register offset (% rsp ).

2. Solution Process

From the command that caused the crash, we can see that the cause of the crash is that the memory on the stack is accessed. However, generally speaking, accessing the memory on the stack will not cause a segment error, because the memory on the stack does not need to be manually managed by programmers, it is generally difficult to make mistakes. Speculation may be due to stack overflow. You need to confirm this idea. The host that crashes is X86_64 + Linux, and ulimit-s is used to know that the default soft limit of the process stack is 10 MB, because the program code does not call setrlimit to adjust the soft limit of the stack, so we need to prove that the process stack with a segment error is greater than 10 MB, and the collapsed address can be viewed from gdb. If you know the starting address of the stack (the bottom of the stack ), the difference between the two is the size of the stack. But how can we know the starting address of the stack? We know that there is a proc file system in Linux. Each process in the system has a folder named after the process ID under/proc, the/proc/PID contains information about processes whose process ID is PID, such as the executable file path, current directory, and opened files of the process. Specifically,/proc/PID/maps contains the starting and ending addresses of all the virtual regions of the process, including the stack (that is, the row corresponding to the last field in the file maps is [stack ). After obtaining the starting address of the stack, use the starting address to subtract the memory address accessed in the command that caused the crash (that is, add an offset to the rsp). The value is greater than 10 MB. As for the cause of stack overflow, the original code defines an array of struct on the stack, and I added several large size fields to the struct, so it overflows. Find the cause and modify the BUG. either modify the default soft limit of the stack in shell, call setrlimit in code, or allocate memory on the stack.

To illustrate this BUG, I wrote a test code as an example. The Code is as follows:

int main(int argc, char* argv[]){    const unsigned len = 10 * (1U << 20);    char data[len];    data[0] = 'a';    return 0;}

Unexpectedly, the process did not crash after compilation! This is very strange, because I defined an array of 10 MB in the main function (and accessed the first element, that is, the one with the smallest address), not to mention the space occupied by environment variables, the space occupied by the single array plus the C Runtime call sequence is more than 10 MB, and the stack soft limit is 10 MB, which will inevitably crash. But it didn't actually crash. At first I suspected the code was optimized, and I didn't think it was optimized with objdump. After thinking for a long time, I had no clue, finally, I finally thought of checking the kernel code to see how the kernel handles stack overflow. This includes two aspects: one is how the stack's soft limit is read, and the other is how the kernel checks whether the stack size exceeds the soft limit. CentOS 5.7 is installed on the crashed machine, and the kernel version obtained by uname-r is 2.6.18-308. el5, this version does not match the official kernel version, because it should be very close to 2.6.18, so I checked the code of the official kernel 2.6.18 (we recommend lxr. linux. no. It is very convenient to view the kernel code of a specific version. You do not need to download dozens of MB of source code packages ).

Soft limit (hard limit) is implemented by the system calling getrlimit. The entry of getrlimit in the kernel is sys_getrlimit. The Code is as follows:

asmlinkage long sys_getrlimit(unsigned int resource, struct rlimit __user *rlim){    if (resource >= RLIM_NLIMITS)        return -EINVAL;    else {        struct rlimit value;        task_lock(current->group_leader);        value = current->signal->rlim[resource];        task_unlock(current->group_leader);        return copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0;    }}

This function is very simple. It reads some resource restrictions of the current process and copies them to the user space. No problems are found.

The check on the stack size limit is completed in page exception (page fault) processing. From the page exception entry page_fault, view the Call Sequence page_fault> do_page_fault> expand_stack> acct_stack_growth, the code for checking the stack size limit is found in the acct_stack_growth function (the Code irrelevant to our example is omitted ):

static int acct_stack_growth(struct vm_area_struct * vma, unsigned long size, unsigned long grow){    //...    struct rlimit *rlim = current->signal->rlim;    //...    /* Stack limit test */    if (size > rlim[RLIMIT_STACK].rlim_cur)        return -ENOMEM;    //...}

Among them, the parameter size is the starting address of the stack minus the address that is currently causing a page exception and aligned up by page size. Obviously, if the stack size is larger than soft limit, an error is returned, the current process is eventually sent a SIGSEGV signal, resulting in a segment error and crash. The kernel code above indicates that my idea is correct. However, the process did not crash as I expected. I thought the kernel version may be incorrect, so I checked the official 2.6.19 code, I found that the code in these two places has not been changed. This is very strange. After a while, I suddenly thought that the machine was installed with CentOS, and CentOS may have modified the official kernel code in this area, so I downloaded and my system corresponding source code package kernel-2.6.18-308.el5.src.rpm, installed, found that the corresponding version of the official kernel version is 2.6.18.4, CentOS modified code in a patch file kernel-2.6.18-redhat.patch, after running the patch, we found that CentOS modified the acct_stack_growth function. The modifications are as follows (this function has been modified in multiple places and only the modifications related to our BUG are listed here ):

static int acct_stack_growth(struct vm_area_struct * vma, unsigned long size, unsigned long grow){    //...    struct rlimit *rlim = current->signal->rlim;    //...    /* Stack limit test */    if (over_stack_limit(size))        return -ENOMEM;    //...}

After comparison, we can find that the official kernel code directly compares the size and stack soft limit, while CentOS puts this comparison in the function over_stack_limit. Then let's look at the function over_stack_limit:

static int over_stack_limit(unsigned long sz){    if (sz < EXEC_STACK_BIAS)        return 0;    return (sz - EXEC_STACK_BIAS) >        current->signal->rlim[RLIMIT_STACK].rlim_cur;}

EXEC_STACK_BIAS is an integer constant and is defined as follows:

#define EXEC_STACK_BIAS       (2*1024*1024)

Obviously, CentOS increases the stack size from soft limit to 2 MB. If the stack size exceeds the soft limit + EXEC_STACK_BIAS of the stack (12 MB in our example) stack Overflow. Now, we can see that the above test code is modified (the array size is changed to 12 MB), and the running process crashes.

3 Summary

Sometimes it is close to the truth of the problem, but no problem is found. For this problem, if I do not write this article, I will not write the above test code, the default size limit of the stack on my machine is 10 MB.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.