Handling Linux kernel state missing page exceptions using abnormal tables

Source: Internet
Author: User
Article title: handle Linux kernel state missing page exceptions using abnormal tables. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
   Preface
In the process of program execution, the CPU cannot finally access the corresponding physical memory unit because of some obstacles, that is, the ing from virtual address to physical address cannot be completed, the CPU generates a page missing exception to handle the page missing exception. Based On the CPU, Linux uses the Demand Paging and Copy On Write technologies.
  
1. request paging is a dynamic memory allocation technique that delays page box allocation until it can no longer be postponed. The motive of this technology is that the process does not access all the content in the address space when it starts to run. In fact, some addresses may never be used by the process. The local principle of the program also ensures that at each stage of the program execution, only a small part of the process page is actually used. for pages temporarily unavailable, its page box can be used by other processes. Therefore, the request paging technology increases the average number of free page boxes in the system, making good use of the memory. From another perspective, without changing the memory size, the request paging can increase the system throughput. When the page to be accessed by a process is not in the memory, the required page is transferred to the memory through page missing exception handling.
  
2. during write replication, the system calls fork. the parent and child processes share the page box in read-only mode. when one of them wants to modify the page box, the kernel allocates a new page box through the page missing exception handler and marks the page box as writable. This processing method can greatly improve the system performance, which has a certain relationship with the operation process of the Linux creation process. Generally, after a child process is created, execve is immediately called by the system to load the image of an executable program into the memory. in this case, the page box of the child process will be reassigned. Therefore, it is obviously inappropriate to copy the page box when fork is used.
  
In the preceding two cases, a page-missing exception occurs, and the process runs in the user state. The Exception handling program allows the process to resume execution from the abnormal command, so that the user does not feel the exception. Of course, exceptions cannot be recovered normally. at this time, the exception handling program will do some aftercare and end the process. That is to say, if a page error occurs in a user-mode process, the stability of the operating system core will not be affected. What should I do if a page exception that cannot be recovered occurs for a process running in the core state? Will the system crash? Whether or not the kernel state page exception can be solved will have a great impact on the stability of the operating system core. if a misoperation occurs, it will cause the system Oops, this is obviously intolerable for users. This article focuses on this problem and introduces a solution adopted in the Linux kernel.
  
Before readers continue reading, there is a need to explain first, the sample code selected in this article is taken from the Linux-2.4.0, the compiling environment is gcc-2.96, the version of objdump is 2.11.93.0.2, you can use the following command to query the specific version information:
  
$ Gcc-v
Reading specs from/usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
Gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
$ Objdump-v
GNU objdump 2.11.93.0.2 20020207
Copyright 2002 Free Software Foundation, Inc.
  
   GCC extension
The GCC extension function is used in this article, that is, the. section pseudo operation provided by the assembler as. I will give a brief introduction before the article starts. This pseudo operation has different interpretations for different executable file formats. I will not list them one by one. I will only describe the usage of the ELF format commonly used in Linux that we are interested in, the command format is as follows:
  
. Section NAME [, "FLAGS"]
  
The well-known C program generally consists of the following parts: text section, data section, non-initialized data section, and heap) and stack. for specific address space layout, refer to the book "Advanced Programming in UNIX environment.
  
In the Linux kernel, you can compile subsequent code into a segment specified by NAME by using the. section pseudo operation. The FLAGS field describes the attributes of the section. it can be expressed by a single character or a combination of multiple characters.
  
'A' relocated segments
'W' writable segment
'X' executable segment
'W' merged segments
'S 'sharing segment
For example, the reader will see:. section. fixup, "ax"
  
Such an instruction defines a segment named. fixup, and the subsequent instruction will be added to this segment. the attribute of this segment is relocable and executable.
  
   Kernel missing page exception handling
Processes running in the core state often need to access the content of the user address space, but no one can ensure that the address information transmitted from the user space obtained by the kernel is "legal. To protect the kernel from attacks with error information, you need to verify the correctness of the address information passed in from the user space.
  
In earlier versions of Linux, this task is completed using the verify_area function:
  
Extern inline int verify_area (int type, const void * addr, unsigned long size)
  
This function verifies whether the access type (read or write) specified in type can be used to access a virtual storage area starting from the addr address and with a size. To do this, verify_read first needs to find the virtual storage area (vma) containing the addr address ). Under normal circumstances (the program runs correctly), this test will return a successful result, and in rare cases, it will fail. That is to say, in most cases, the kernel takes a short time in some useless verification operations, which is unacceptable in terms of operating system efficiency.
  
To solve this problem, the verification work in the current Linux design is handed over to the hardware device in the virtual storage. After the paging mechanism is enabled, if the page frame corresponding to the virtual address of a command is not in memory or the access type is incorrect, a page exception occurs. The processor installs the virtual address that causes the page exception to the register CR2, and provides an error code to indicate the type of memory access that causes the page exception, then, the Linux missing page exception handling function is called for processing.
  
In Linux, the function for handling page missing exceptions is as follows:
  
  
Asmlinkage void do_page_fault (struct pt_regs * regs, unsigned long error_code)
{
........................
_ Asm _ ("movl % cr2, % 0": "= r" (address ));
........................
Vma = find_vma (mm, address );
If (! Vma)
Goto bad_area;
If (vma-> vm_start <= address)
Goto good_area;
If (! (Vma-> vm_flags & VM_GROWSDOWN ))
Goto bad_area;
If (error_code & 4 ){
If (address + 32 <regs-> esp)
Goto bad_area;
........................
Bad_area:
........................
No_context:
/* Are we prepared to handle this kernel fault? */
If (fixup = search_exception_table (regs-> eip ))! = 0 ){
Regs-> eip = fixup;
Return;
}
...........................
}
  
First, let's take a look at the two parameters passed to this function call: they all use entry. S created in the stack (arch/i386/kernel/entry. s). The regs parameter points to the register stored in the stack. the error code of the error code is stored in error_code, for details about the stack layout, see (for details about the stack generation process, see "Linux kernel source code scenario analysis)
  
This function first obtains the virtual address with a page exception from the CPU control register CR2. Due to the many types of page missing exception to be handled by the page missing exception handler, the branch is also very complicated. Based on the purpose of this article, we only care about the following kernel page missing exception handling situations:
  
1. if the content of the kernel address space to be accessed by the program is not in the memory, jump to the label vmalloc_fault first. if the page directory corresponding to the accessed content is not in the memory, then jump to the label no_context;
  
2. the page missing exception occurs in the interrupt or kernel thread and jumps to the no_context label;
  
3. when the program runs in the core state, it accesses user space data. the accessed data is not in the memory.
  
A) an abnormal virtual address is located in a vma of the process, but the system memory cannot be allocated with a free page frame. in this case, jump to the out_of_memory label and then to the no_context label;
  
B) if an abnormal virtual address does not belong to any vma of the process and does not belong to the scope of stack extension, jump to the label bad_area first, and finally reach the label no_context.
  
From the above situations, we focus on the no_context at the end, that is, the call to the search_exception_table function. This function is used to find the next instruction (fixup) that can continue to run in the exception table by executing the "regs-> eip" command with a page missing exception ). The exception table mentioned here contains some address pairs. the previous address in the address pair indicates the address of the abnormal command, and the last one indicates that the current command has an error, the fix address that the program can continue to execute.
  
If this search operation is successful, the page missing exception handler changes the return address (regs-> eip) in the stack to the fix address and returns it. then, the process in which an exception occurs will continue to be executed according to the preset commands in the fixup. Of course, if the matching repair address cannot be found, the system only prints the error information and stops working.
  
How is this so-called repair address generated? Is it automatically generated by the system? The answer is no, of course. These repair commands are written by programmers into the kernel source code through the extension function provided by the. Next we will analyze its implementation mechanism.
  
   Abnormal table implementation mechanism
I take include/asm-i386/uaccess. h macro definition _ copy_user write a program as an example to explain.
/* Hello. c */
# Include
# Include
  
# Define _ copy_user (to, from, size) do {int _ d0, _ d1 ;__ asm _ volatile _ ("0: rep; movsl \ n "" movl % 3, % 0 \ n "" 1: rep; movsb \ n "" 2: \ n "". section. fixup, \ "ax \" \ n "" 3: lea 0 (% 3, % 0, 4), % 0 \ n "" jmp 2b \ n "". previous \ n "". section _ ex_table, \ "a \" \ n "". align 4 \ n "". long 0b, 3b \ n "". long 1b, 2b \ n "". previous ":" = & c "(size)," = & D "(_ d0)," = & S "(_ d1):" r "(s

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.