If the ADDR address belongs to the address space of the Process, run the do_page_fault () command at the good_area flag:
/*
* OK, we have a good vm_area for this memory access, so
* We can handle it ..
*/
Good_area:
Si_code = segv_accerr;
Write = 0;
Switch (error_code & 3 ){
Default:/* 3: Write, present */
# Ifdef test_verify_area
If (regs-> cs = kernel_cs)
Printk ("WP fault at % 08lx/N", regs-> EIP );
# Endif
/* Fall through */
Case 2:/* write, not present */
If (! (VMA-> vm_flags & vm_write ))
Goto bad_area;
Write ++;
Break;
Case 1:/* read, present */
Goto bad_area;
Case 0:/* read, not present */
If (! (VMA-> vm_flags & (vm_read | vm_exec | vm_write )))
Goto bad_area;
}
First, the page missing exception may be in the user State, that is, error_code & 3 = 2. If the exception is caused by write access, the function checks whether the linear zone is writable (! (VMA-> vm_flags & vm_write )). If it cannot be written, jump to the bad_area code. If it can be written, set the write local variable to 1.
If the exception is caused by read or execution access, the function checks whether the page already exists in Ram. If an exception exists, the process tries to access a privileged page box in the user State (the user/supervisor flag in the page box is cleared ), therefore, the function jumps to the bad_area code (however, this will never happen because the kernel will not give the privileged page frame thief to the process .). If error_code & 3 = 0 does not exist, the function will also check whether the linear zone is readable or executable.
If the access permission in this linear area matches the access type that causes the exception, call the handle_mm_fault () function to allocate a new page box:
Keep ve:
/*
* If for any reason at all we couldn't handle the fault,
* Make sure we exit gracefully rather than endlessly redo
* The fault.
*/
Switch (handle_mm_fault (mm, VMA, address, write )){
Case vm_fault_minor:
Tsk-> min_flt ++;
Break;
Case vm_fault_major:
Tsk-> maj_flt ++;
Break;
Case vm_fault_sigbus:
Goto do_sigbus;
Case vm_fault_oom:
Goto out_of_memory;
Default:
Bug ();
}
/*
* Did it hit the DOS screen memory VA from vm86 mode?
*/
If (regs-> eflags & vm_mask ){
Unsigned long bit = (address-0xa0000)> page_shift;
If (bit <32)
Tsk-> thread. screen_bitmap | = 1 <bit;
}
Up_read (& mm-> mmap_sem );
Return;
If the handle_mm_fault () function successfully assigns a page box to the process, vm_fault_minor or vm_fault_major is returned. The value vm_fault_minor indicates that the missing page is processed without blocking the current process. This missing page is called the next missing page (minor fault ). The value vm_fault_major indicates that the page is missing, which forces the current process to sleep (probably because it takes time to fill the allocated page box with data on the disk ); blocking the missing page of the current process is called the main missing page (major fault ). The function also returns vm_fault_oom (with insufficient memory) or vm_fault_stgbos (with any other errors ).
If handle_mm_fault () returns vm_fault_sigbus, The sigbus signal is sent to the process:
Do_sigbus:
Up_read (& mm-> mmap_sem );
/* Kernel mode? Handle exceptions or die */
If (! (Error_code & 4 ))
Goto no_context;
/* User space => OK to do another page fault */
If (is_prefetch (regs, address, error_code ))
Return;
Tsk-> thread. Cr2 = address;
Tsk-> thread. error_code = error_code;
Tsk-> thread. trap_no = 14;
Force_sig_info_fault (sigbus, bus_adrerr, address, tsk );
}
If handle_mm_fault () does not allocate a new page box, it returns vm_fault_oom. In this case, the kernel usually kills the current process. However, if the current process is an INIT process (tsk-> pid = 1 ), put it at the end of the running queue and call the scheduler. Once init resumes execution, handle_mm_fault () is executed again ():
Out_of_memory:
Up_read (& mm-> mmap_sem );
If (tsk-> pid = 1 ){
Yield ();
Down_read (& mm-> mmap_sem );
Goto keep ve;
}
Printk ("VM: Killing process % s/n", tsk-> comm );
If (error_code & 4)
Do_exit (sigkill );
Goto no_context;
Next we will analyze the handle_mm_fault () function in detail. This function is the top priority and acts on four parameters:
MM: Specifies the memory descriptor of the process running on the CPU when an exception occurs.
VMA: the descriptor pointing to the linear zone of the linear address that causes the exception.
Address: The linear address that causes an exception.
Write_access: If tsk tries to write data to address, it is set to 1 (in our case). If tsk tries to read or execute data in address, it is set to 0.
Static inline int handle_mm_fault (struct mm_struct * mm,
Struct vm_area_struct * VMA, unsigned long address,
Int write_access)
{
Return _ handle_mm_fault (mm, VMA, address, write_access )&
(~ Vm_fault_write );
}
/*
* By the time we get here, we already hold the MM semaphore
*/
Int _ handle_mm_fault (struct mm_struct * Mm, struct vm_area_struct * VMA,
Unsigned Long Address, int write_access)
{
Pgd_t * PGD;
Pud_t * pud;
Pmd_t * PMD;
Pte_t * PTE;
_ Set_current_state (task_running );
Count_vm_event (pgfault );
If (unlikely (is_vm_hugetlb_page (VMA )))
Return hugetlb_fault (mm, VMA, address, write_access );
PGD = pgd_offset (mm, address );
Pud = pud_alloc (mm, PGD, address );
If (! PUD)
Return vm_fault_oom;
PMD = pmd_alloc (mm, pud, address );
If (! PMD)
Return vm_fault_oom;
PTE = pte_alloc_map (mm, PMD, address );
If (! PTE)
Return vm_fault_oom;
Return handle_pte_fault (mm, VMA, address, PTE, PMD, write_access );
}
This function checks whether the Directory and page table in the middle of the page mapped to the address exist:
If (! PUD)
If (! PMD)
If (! PTE)
However, PGD must exist:
PGD = pgd_offset (mm, address)
# Define pgd_offset (mm, address) (mm)-> PGD + pgd_index (Address ))
# Define pgd_index (Address)> pgdir_shift) & (PTRS_PER_PGD-1 ))
Well, it's easy to find the global directory item on the page corresponding to the address and assign it to the PGD local variable.
Even if the address belongs to the address space of the process, the corresponding page table may not be allocated. Therefore, before doing anything else, execute the task of allocating the page Directory and page table:
Pud = pud_alloc (mm, PGD, address );
PMD = pmd_alloc (mm, pud, address );
PTE = pte_alloc_map (mm, PMD, address );
If it is i386, the pud will not be used, so pud_alloc will allocate an empty pud and handle_pte_fault will not be used. Let's pick pte_alloc_map and check it out. pmd_alloc is similar to him:
# Define pte_alloc_map (mm, PMD, address )/
(Unlikely (! Pmd_present (* (PMD) & _ pte_alloc (mm, PMD, address ))? /
Null: pte_offset_map (PMD, address ))
Int _ pte_alloc (struct mm_struct * Mm, pmd_t * PMD, unsigned long address)
{
Struct page * New = pte_alloc_one (mm, address );
If (! New)
Return-enomem;
Pte_lock_init (new );
Spin_lock (& mm-> page_table_lock );
If (pmd_present (* PMD) {/* Another has populated it */
Pte_lock_deinit (new );
Pte_free (new );
} Else {
Mm-> nr_ptes ++;
Inc_zone_page_state (new, nr_pagetable );
Pmd_populate (mm, PMD, new );
}
Spin_unlock (& mm-> page_table_lock );
Return 0;
}
Struct page * pte_alloc_one (struct mm_struct * Mm, unsigned long address)
{
Struct page * PTE;
# Ifdef config_highpte
PTE = alloc_pages (gfp_kernel |__ gfp_highmem |__ gfp_repeat |__ gfp_zero, 0 );
# Else
PTE = alloc_pages (gfp_kernel |__ gfp_repeat |__ gfp_zero, 0 );
# Endif
Return PTE;
}
No. If it is not a high-end ing, allocate a page table with the size of a page.
Return to _ handle_mm_fault. The PGD local variable contains the global directory item of the page that references the address. If necessary, call the pud_alloc () and pmd_alloc () functions to allocate a new parent directory of the page and an intermediate directory of the page (in the 80x86 microprocessor, this allocation will never happen, because the parent directory of the page is always included in the global directory of the page, and the middle directory of the page or included in the parent directory of the page (PAE is not activated ), or, it is allocated with the parent directory of the page (PAE is activated )). But! The pud_alloc () and pmd_alloc () functions will still be executed successfully. The value of the pud and PMD temporary variables returned by pud_alloc () is the value of the global directory item on the page corresponding to address. Then, if necessary, the called pte_alloc_map () function will allocate a new page table. We can see in the _ pte_alloc function that (pmd_present (* PMD) indicates that the page table pointed to by the global directory of the page already exists, pte_free (new ), you do not need to allocate a new page table.
If both steps are successful, the page table item pointed to by the PTE local variable is the table item that references the address. Then call the handle_pte_fault () function to check the page table items corresponding to the address and decide how to assign a new page box to the process:
1. If the accessed page does not exist, that is, the page is not stored in any page box, the kernel allocates a new page box and initializes it as appropriate. This technology is called request paging ).
2. If the accessed page exists but is marked as read-only, that is, it is already stored in a page box, the kernel allocates a new page box, and copy the data in the old page box to the new page box to initialize its content. This technology is called copy on write (COW ).
These two technologies are both very, very important knowledge points. The next two blog posts will focus on these two technologies. We hope you will pay attention to them!