Linux page missing Exception Handling-kernel space

Source: Internet
Author: User

Page missing exceptions are usually triggered in two situations --

1. Access to illegal addresses due to improper Programming

2. The accessed address is valid, but the address has not been assigned a physical page.

The second case is explained below, which is a feature of virtual memory management. Although each process has 3 GB of accessible address space independently, these resources are empty check issued by the kernel. That is to say, the process is holding a virtual memory area (vma) related to itself ), however, these virtual memory areas are not linked to the physical page boxes when they are created. Due to the program's Locality Principle, the memory accessed by the Program within a certain period of time is often limited, therefore, the kernel will only associate the corresponding virtual memory area with the physical memory when the process actually needs to access the physical memory (allocate page table items for the corresponding address, and map the page table items to the physical memory). That is to say, this page missing exception is normal, and the first page missing exception is abnormal, the kernel should adopt various feasible means to minimize the damage caused by such exceptions.

The page missing exception processing function is do_page_fault (), which is a function related to the architecture. There are two types of page missing exceptions, one is kernel space (4th GB of linear address space accessed) and the other is user space (0 ~ of linear address space accessed ~ 3 GB). Taking the X86 architecture as an example, let's look at how to handle kernel space exceptions.

Dotraplinkage void _ kprobes
Do_page_fault (struct pt_regs * regs, unsigned long error_code)
Struct vm_area_struct * vma;
Struct task_struct * tsk;
Unsigned long address;
Struct mm_struct * mm;
Int write;
Int fault;
Tsk = current; // get the current process
Mm = tsk-> mm; // obtain the address space of the current process
/* Get the faulting address :*/
Address = read_cr2 (); // read the CR2 register to obtain the access address that triggers the exception

If (unlikely (fault_in_kernel_space (address) {// determines whether the address is in the Kernel linear address space
If (! (Error_code & (PF_RSVD | PF_USER | PF_PROT) {// determines whether the file is in the kernel state.
If (vmalloc_fault (address)> = 0) // handle vmalloc exceptions
If (kmemcheck_fault (regs, address, error_code ))
/* Can handle a stale RO-> rw tlb :*/
/* The exception occurs in the kernel address space but does not belong to the above situation or the above method cannot be corrected,
Check whether the corresponding page table items exist and whether the permissions are sufficient */
If (spurious_fault (error_code, address ))
/* Kprobes don't want to hook the spurious faults :*/
If (policy_page_fault (regs ))
* Don't take the mm semaphore here. If we fixup a prefetch
* Fault we cocould otherwise deadlock:
Bad_area_nosemaphore (regs, error_code, address );

Two parameters passed by this function --

Regs contains the values of each register

Error_code is the error type that triggers an exception. Its meaning is as follows:

* Page fault error code bits:
* Bit 0 = 0: no page found 1: protection fault
* Bit 1 = 0: read access 1: write access
* Bit 2 = 0: kernel-mode access 1: user-mode access
* Bit 3 = 1: use of reserved bit detected
* Bit 4 = 1: fault was an instruction fetch
Enum x86_pf_error_code {
PF_PROT = 1 <0,
PF_WRITE = 1 <1,
PF_USER = 1 <2,
PF_RSVD = 1 <3,
PF_INSTR = 1 <4,

First, check whether the trigger address of the exception is in the kernel address space, that is, address> = TASK_SIZE_MAX, which is generally 3 GB. Then, check whether the trigger exception is in the kernel state. If the two conditions are met, try to solve the exception through vmalloc_fault. Because the kernel only updates the main kernel page table when vmalloc is used to apply for memory, the process page table currently in use may be triggered due to this exception because it is not synchronized with the main kernel page table, therefore, this function tries to synchronize the page table items corresponding to the address with the main kernel page table.

Static noinline int vmalloc_fault (unsigned long address)
Unsigned long pgd_paddr;
Pmd_t * pmd_k;
Pte_t * pte_k;
/* Determine whether the address that triggered the exception is in the VMALLOC Region */
If (! (Address> = VMALLOC_START & address <VMALLOC_END ))
* Synchronize this task's top level page-table
* With the 'reference' page table.
* Do _ not _ use "current" here. We might be inside
* An interrupt in the middle of a task switch ..
Pgd_paddr = read_32a (); // obtain the current PGD address
Pmd_k = vmalloc_sync_one (_ va (pgd_paddr), address); // synchronize the currently used page table with the kernel page table
If (! Pmd_k)
/* At this point, the kernel page table corresponds to the address's pmd, and the value is set to the current page table's pmd,
The last step is to determine whether the pte item corresponding to the pmd exists */
Pte_k = pte_offset_kernel (pmd_k, address); // obtain the pte entry of the address corresponding to pmd
If (! Pte_present (* pte_k) // determines whether the pte item exists. If it does not exist, it fails.
Return 0;
Synchronous processing:

Static inline pmd_t * vmalloc_sync_one (pgd_t * pgd, unsigned long address)
Unsigned index = pgd_index (address );
Pgd_t * pgd_k;
Pud_t * pud, * pud_k;
Pmd_t * pmd, * pmd_k;
Pgd + = index; // record the address offset of the current page table pgd
Pgd_k = init_mm.pgd + index; // records the address offset of the kernel page table.
If (! Pgd_present (* pgd_k) // If the entry corresponding to the kernel PGD page table does not exist, the next step cannot be performed and NULL is returned.
Return NULL;
* Set_pgd (pgd, * pgd_k); here wocould be useless on PAE
* And redundant with the set_pmd () on non-PAE. As wowould
* Set_pud.

/* Obtain the address PUD of the current page table and the address of the kernel page table, and determine whether the items corresponding to pud_k exist */
Pud = pud_offset (pgd, address );
Pud_k = pud_offset (pgd_k, address );
If (! Pud_present (* pud_k ))
Return NULL;
/* Perform operations similar to the preceding operations on pmd */
Pmd = pmd_offset (pud, address );
Pmd_k = pmd_offset (pud_k, address );
If (! Pmd_present (* pmd_k ))
Return NULL;
If (! Pmd_present (* pmd) // if the information items corresponding to the currently used page table do not exist, modify the information items so that they are the same as those in the pmd_k of the kernel page table.
Set_pmd (pmd, * pmd_k );
BUG_ON (pmd_page (* pmd )! = Pmd_page (* pmd_k ));
Return pmd_k;

If the do_page_fault () function executes bad_area_nosemaphore (), it indicates that this exception is caused by illegal address access. In the kernel, there are generally two types of results:

1. the kernel uses the system call parameters passed by the user space to access invalid addresses.

2. kernel program design defects

In the first case, the kernel can still be repaired through the exception correction mechanism, while in the second case, OOPS errors will occur, and the kernel will force SIGKILL to end the current process.

The actual processing function of bad_area_nosemaphore () in the kernel state is bad_area_nosemaphore () -- >__ bad_area_nosemaphore () --> no_context ()

<Span style = "font-size: 12px;"> static noinline void
No_context (struct pt_regs * regs, unsigned long error_code,
Unsigned long address)
Struct task_struct * tsk = current;
Unsigned long * stackend;
Unsigned long flags;
Int sig;
/* Are we prepared to handle this kernel fault? */
/* Fixup_exception () is used to search for the exception table and try to find a routine to correct the exception,
This routine is executed after fixup_exception () returns */
If (fixup_exception (regs ))
* 32-bit:
* Valid to do another page fault here, because if this fault
* Had been triggered by is_prefetch fixup_exception wocould have
* Handled it.
* 64-bit:
* Hall of shame of CPU/BIOS bugs.
If (is_prefetch (regs, error_code, address ))
If (is_errata93 (regs, address ))
* Oops. The kernel tried to access some bad page. We'll have
* Terminate things with extreme prejudice:
/* It indicates that the exception is indeed caused by program design defects in the kernel.
To generate an oops, the following task is to print the information of the CPU register and kernel state stack to the console and
Terminate the current process */
Flags = oops_begin ();
Show_fault_oops (regs, error_code, address );
Stackend = end_of_stack (tsk );
If (* stackend! = STACK_END_MAGIC)
Printk (KERN_ALERT "Thread overran stack, or stack upted \ n ");
Tsk-> thread. cr2 = address;
Tsk-> thread. trap_no = 14;
Tsk-> thread. error_code = error_code;
If (_ die ("Oops", regs, error_code ))
Sig = 0;
/* Executive summary in case the body of the oops scrolled away */
Printk (KERN_EMERG "CR2: % 016lx \ n", address );
Oops_end (flags, regs, sig );

Author: vanbreaker

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.