Cause Analysis of Segmentation fault on Linux and X86

Source: Internet
Author: User

Cause Analysis of Segmentation fault on Linux and X86

My blog: http://blog.striveforfreedom.net

Table of Contents
  • 1 Overview
  • 2. Three common memory access methods that cause segment errors
    • 2.1 access the kernel space in user mode
    • 2.2 access the memory space that has not been created
    • 2.3 write access read-only space
  • 3. System segment error handling
    • 3.1 capture segment errors by CPU
    • 3.2 kernel segment error handling
    • 3.3 user programs handle segment errors
  • 4. Summary
1 Overview

Writing a C program on Linux is a common segment error (Segmentation fault). It is estimated that every programmer has encountered this problem, the cause of a segment error is that the process receives the SIGSEGV signal and does not capture the signal. Here, we mainly consider common segment errors caused by illegal access to memory, and do not consider other methods that cause segment errors, such as executing privileged commands in user mode. The following describes three ways to access the memory illegally, and then provides the system's solution to segment errors, we hope that the cause can be quickly located after the occurrence of a segment error or segment error is reduced. This article only considers the X86 platform. Assume that [0, 3G) is the user's virtual space, and [3G, 4G) is the kernel virtual space 1. The situations on other platforms should be similar.

2. Three common memory access methods that cause segment errors

The process has a segment error because the process has unauthorized access to the memory. However, some programmers may not be familiar with the format of memory access that may cause segment errors. The following describes three memory access methods that may cause segment errors and the corresponding sample code.

2.1 access the kernel space in user mode

3G and above virtual space belongs to the kernel space, user mode access will cause a segment error, the sample code is as follows:

int* p = (int*)(unsigned long)-1;*p = 0;
2.2 access the memory space that has not been created

The process can only access the memory area automatically requested by the process or allocated by the kernel for the process. Other memory areas are not created, as long as the accessed virtual address falls into these uncreated memory space, it will cause segment Error 2. The memory areas that are not created are described below.

int* p = NULL;*p = 0;

The access NULL pointer is a special case of this access method. The following code also belongs to this access method:

int* p = (int*)100;*p = 0;
2.3 write access read-only space

Generally, the code and read-only data of a process belong to the read-only space. Write Access to the process will cause a segment error. Sample Code for writing access code is as follows:

char* p = (char*)&main;*p = '\0';

Sample Code for read-only data writing:

const char* p = "abcd";*(char*)p = 'a';

Obviously, programmers generally do not intentionally write the above Code. Generally, the pointer is out of order to cause behaviors similar to the above Code. The following describes the main causes of segment errors:

  • Use uninitialized variables
  • Use released memory
  • Cross-border access (such as array out-of-bounds and buffer overflow)
3. System segment error handling

The previous section describes the cause of a segment error and how the system handles the segment error. The system can process segment errors in three parts: the capture of segment errors by the CPU, the processing of segment errors by the kernel, and the processing of segment errors by user programs.

3.1 capture segment errors by CPU

The segment error caused by unauthorized access to memory is actually a page exception. The capture of this segment error is included in the capture of page exceptions. The less ancient X86 CPUs provide segment-based and page-based memory management. Linux uses the X86 segment-based protection mechanism and page-based protection mechanism to capture page exceptions. The segment protection mechanism of X86 provides four running levels for the CPU (0 ~ 3) the Linux kernel only uses two of the running levels. The kernel mode corresponds to level 0, and the user mode corresponds to Level 3. The Current CPU running Level Intel is called CPL (Current Privilege Level), and CPL is stored in the segment register CS. The page-based protection mechanism of X86 provides the page table mechanism. The CPU uses the page table to convert virtual addresses to physical addresses and detect page exceptions. Consider two levels of Page tables, Page-Directory Entry and Page-Table Entry. If the Page Directory item or Page-Table item has not been created (the value is 0 ), A page exception is triggered when the corresponding virtual address is converted. Both the page Directory items and page table items contain R/W and U/S bits. If the R/W of a page Directory item or page table item corresponding to a virtual address is 0, indicates that the physical page corresponding to the virtual address is read-only. Writing to it triggers a page exception. If the U/S bit of a page Directory item or page table item corresponding to a virtual address is 0, access to this virtual address when CPL is 3 will trigger a page exception, that is, only the kernel is allowed to access this page. Generally, the CPU determines whether a page exception occurs based on the value of the page Directory item/page table item corresponding to the CPL and the current converted virtual address. If a page exception occurs, the CPU automatically saves the site, and press the error code (the error code contains some CPU status information when a page exception occurs) into the kernel stack. Then, the kernel will jump to the page exception handler and start executing.

3.2 kernel segment error handling

When a segment error occurs due to unauthorized memory access, the CPU automatically jumps to the page and runs the exception handler. It should be noted that the memory access that causes page exceptions is not all illegal, and legal memory access sometimes triggers page exceptions. The kernel needs to identify which memory access is legal and which is illegal. Common valid memory access that causes page exceptions include:

  • The anonymous memory has not been assigned a physical page or the physical page has been switched to the swap partition/swap file.
  • The file mapped by mmap has not been read into the memory or read into the memory, but the page has been recycled due to memory shortage.
  • The implementation of copy on write also relies on page exceptions.

This article will not elaborate on these legal situations. The kernel determines whether the page exception is illegal or whether a segment error occurs based on the following three factors:

  • The virtual address that triggers a page exception. When a page exception occurs, this address is stored in the CR2 register.
  • When a page exception occurs, the CPU is in kernel or user mode, and the memory access mode is read access or write access. The information is stored in the error code when the exception occurs.
  • Virtual address areas that have been created by the process. These areas include those that are manually created by the user process, such as those created by calling brk and mmap by the system, and automatically created by the package or kernel for the process, such as process code segment/data segment/dynamic library/stack. These virtual regions have some attributes, such as Writable/read-only. For example, the code segment is read-only. These virtual areas are organized in the form of linked lists in a very small number of hours. When the number changes to a large number, the kernel uses a binary tree (AVL Tree in the early stage, and then changed to the red-black tree) to organize these areas to speed up search. The user space of a process is 3 GB. The created virtual areas are only part of the 3 GB. Only these virtual areas can be accessed by the process legally, the remaining part of 3G is in the uncreated area, and process access will cause a segment error.

The kernel determines whether a segment error has occurred based on the following conditions:

  • When a page exception occurs, the CPU is in user mode, and the virtual address that triggers the page exception is greater than or equal to 3G
  • The virtual address that triggers the page exception is not in any created virtual Region
  • The virtual address that triggers the page exception belongs to a created virtual area, but the virtual area is read-only, and the page exception is caused by write access.

These three situations correspond to the above three sample codes in sequence. When the kernel finds that the above conditions are met, it is known that the exception is caused by illegal memory access, so it sends a SIGSEGV signal to the current process.

3.3 user programs handle segment errors

From the kernel's processing of segment errors, we can see that the kernel sends a SIGSEGV signal to the process in which a segment error occurs, and the program generally does not capture this signal, which will be processed by default, the result is that the process is killed. So we usually see the hateful "Segmentation fault" on the screen ". However, it is surprising that the SIGSEGV signal is a captured signal 3 (only SIGKILL and SIGSTOP are not captured), and the CPU's exception handling features are taken into account (segment error is an exception) -- after the exception handling code is executed, the CPU will re-execute the command that causes the exception. We can write the following interesting code. The following Code does not have any jump statements, but it will enter an endless loop:

#include <stdlib.h>#include <signal.h>static void foo(int sig){    (void)sig;    return;}int main(int argc, char* argv[]){    struct sigaction action;    action.sa_handler = foo;    sigemptyset(&action.sa_mask);    action.sa_flags = 0;    if(sigaction(SIGSEGV, &action, NULL) == -1){        return -1;    }    int* p = NULL;    *p = 0;    return 0;}
4. Summary

As a summary, here is a diagram of the virtual space of the process (in general, there are differences in details ):

Note that the black areas marked with hole belong to uncreated virtual space, and these areas do not belong to the memory applied by the program or the memory automatically allocated by the kernel for the process. No read-only data zone is provided, because read-only data is usually placed in the code segment.

Footnotes:

1. Some versions of Linux kernel provide virtual space partitioning configuration options. You can select one or more versions before compiling the kernel, you can also select 2 GB kernel space or 2 GB user space.

2. If the accessed virtual address does not fall into the corresponding virtual area of the stack, but is very close to the top pointer (esp) of the stack (the difference is less than 32 bytes ), and the stack size has not reached the limit (the value can be viewed through ulimit-s), and the total virtual memory size of the process has not reached the limit (the value can be viewed through ulimit-v ), the stack is automatically extended, and this access will not cause a segment error.

3. Many may have the following question: since a process with a segment error cannot continue to execute, why should SIGSEGV be a captured signal? Is it better to make SIGSEGV an uncaptured signal? My idea is that this provides an opportunity for programs such as gdb to stop running when the debugging process receives a signal through the ptrace system call, this gives programmers the opportunity to view the current state of the process before the process is killed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.