Linux kernel Source-code scenario Analysis-System call BRK ()

Source: Internet
Author: User

First look at the process address space:



We simply say, from low address to high address, code area and data area, empty, stack area.
In the Linux kernel source scenario analysis-the extension of the memory management user stack, we applied the page from the stack area down to the data area above .
In the Linux kernel source scenario analysis-the Memory management user page is swapped in, we have applied for the swap/swap page.
In this article, we are applying a page from the data area up to the bottom of the stack area .
We analyze by an example, BRK (), see:



1. Because the new boundary is higher than the old border address, we apply for the page between the old boundary and the new boundary. is to map the corresponding virtual address to the physical page.
BRK corresponds to the system call is SYS_BRK, the code is as follows:

Asmlinkage unsigned long sys_brk (unsigned long brk) {unsigned long rlim, retval;unsigned long newbrk, oldbrk;struct Mm_stru CT *mm = Current->mm;down (&mm->mmap_sem); if (BRK < Mm->end_code)//BRK cannot be greater than the end of the snippet address goto OUT;NEWBRK = Page_ ALIGN (BRK); oldbrk = Page_align (MM-&GT;BRK); if (oldbrk = = newbrk) goto set_brk;/* always allow shrinking brk. */if (BRK <= mm->brk) {//Current new boundary is greater than old boundary if (!do_munmap (mm, NEWBRK, oldbrk-newbrk)) goto Set_brk;goto out;} /* Check against Rlimit. */rlim = current->rlim[rlimit_data].rlim_cur;if (Rlim < rlim_infinity && brk-mm->start_data > Rlim) cannot exceed the limit goto out;/* Check against existing mmap mappings.  */if (Find_vma_intersection (mm, OLDBRK, newbrk+page_size))//See if there is a conflict with the existing virtual space goto out;/* Check If we have enough memory. */if (!vm_enough_memory ((NEWBRK-OLDBRK) >> page_shift))//whether there is enough free memory page goto out;/* Ok, looks good-let it rip. */if (DO_BRK (OLDBRK, newbrk-oldbrk)! = OLDBRK)//new boundary is greater than the old boundary, establish map goto OUT;SET_BRK:MM-&GT;BRK = brk;//brk settingFor the new boundary out:retval = Mm->brk;up (&mm->mmap_sem); return retval;} 


In this case, the new boundary is larger than the old boundary, and we establish the mapping. The DO_BRK code is as follows:

unsigned long do_brk (unsigned long addr, unsigned long len) {struct mm_struct * mm = current->mm;struct Vm_area_struct * vma;unsigned long flags, Retval;len = Page_align (len), if (!len) return addr;/* * Mlock mcl_future? */if (Mm->def_flags & vm_locked) {unsigned long LOCKED = MM-&GT;LOCKED_VM << page_shift;locked + = len;if (loc ked > Current->rlim[rlimit_memlock].rlim_cur) return-eagain;}  /* * Clear old maps. This also does some the error checking for us */retval = Do_munmap (mm, addr, len);//find_vm_intersection to the conflict check, in fact, only check the high-end of the new district, no Have to check the low end. For the low-end conflict is allowed, the solution is to use the new mapping, first through Do_munmap the original map is lifted, and then to establish the mapping if (retval! = 0) return retval;/* Check against address space Limits *after* clearing old maps ... */if ((MM-&GT;TOTAL_VM << page_shift) + len > current->rlim[rlimit_a S].rlim_cur) return-enomem;if (Mm->map_count > Max_map_count) return-enomem;if (!vm_enough_memory (Len >> Page_shift)) Return-enomem;flags = Vm_flags (prot_read| prot_write| prot_exec,map_fixed| MAP_private) | Mm->def_flags;flags |= Vm_mayread | Vm_maywrite | vm_mayexec;/* Can We just expand an old anonymous mapping?  */if (addr) {///first see if you can merge the struct vm_area_struct * VMA = FIND_VMA (mm, addr-1) with the original interval; if (VMA && vma->vm_end = addr &&!vma->vm_file && Vma->vm_flags = = flags) {vma->vm_end = addr + Len;goto out;}} /* * Create a VMA struct for an anonymous mapping */VMA = Kmem_cache_alloc (Vm_area_cachep, Slab_kernel);//If you cannot merge with the existing interval, assign VM_AREA_STRUCT structure if (!VMA) return-enomem;vma->vm_mm = Mm;vma->vm_start = addr;//start Address vma->vm_end = addr + len;// End Address Vma->vm_flags = Flags;vma->vm_page_prot = protection_map[flags & 0x0f];vma->vm_ops = NULL;vma->vm_ Pgoff = 0;vma->vm_file = Null;vma->vm_private_data = null;insert_vm_struct (mm, VMA); OUT:MM-&GT;TOTAL_VM + = Len >> page_shift;if (Flags & vm_locked) {//Only call MAKE_PAGES_PRESENTMM-&GT;LOCKED_VM + + len >> Page_ when locking an interval) Shift;make_pages_present (addr, addr + len);//For newVirtual space establishes the mapping of memory pages}return addr;} 

Make_pages_present, the code is as follows:
int make_pages_present (unsigned long addr, unsigned long end) {int write;struct mm_struct *mm = current->mm;struct Vm_ar Ea_struct * VMA;VMA = FIND_VMA (mm, addr); write = (Vma->vm_flags & vm_write)! = 0;if (addr >= end) BUG ();d o {if (h Andle_mm_fault (mm, VMA, addr, write) < 0) Return-1;addr + = Page_size;} while (addr < end); return 0;}

The method used here is interesting, which is to simulate a page fault in each of the new ranges.


Finally returns the SYS_BRK,MM->BRK set to the new boundary.

2. Because the new boundary is lower than the old boundary address, we release the page before the new boundary and the old boundary. Such as:



BRK corresponds to the system call is SYS_BRK, the code is as follows:

Asmlinkage unsigned long sys_brk (unsigned long brk) {unsigned long rlim, retval;unsigned long newbrk, oldbrk;struct Mm_stru CT *mm = Current->mm;down (&mm->mmap_sem), if (BRK < mm->end_code) goto OUT;NEWBRK = Page_align (BRK); o LDBRK = Page_align (MM-&GT;BRK); if (oldbrk = = newbrk) goto set_brk;/* always allow shrinking brk. */if (BRK <= mm->brk) {//current old boundary is greater than new boundary if (!do_munmap (mm, NEWBRK, OLDBRK-NEWBRK))//Unmap goto Set_brk;goto out;} /* Check against Rlimit. */rlim = current->rlim[rlimit_data].rlim_cur;if (Rlim < rlim_infinity && brk-mm->start_data > Rlim) Goto out;/* Check against existing mmap mappings. */if (Find_vma_intersection (mm, OLDBRK, newbrk+page_size)) Goto out;/* Check If we have enough memory. */if (!vm_enough_memory (NEWBRK-OLDBRK) >> page_shift) goto out;/* Ok, looks good-let it rip. */if (DO_BRK (OLDBRK, newbrk-oldbrk)! = oldbrk) Goto OUT;SET_BRK:MM-&GT;BRK = brk;//set to new boundary Out:retval = Mm->brk;up ( &AMP;MM-&GT;MMAP_SEM); return retval;} 

Do_munmap de-map, as follows:

int Do_munmap (struct mm_struct *mm, unsigned long addr, size_t len) {struct vm_area_struct *mpnt, *prev, **NPP, *free, *ext Ra;if ((addr & ~page_mask) | | addr > Task_size | | len > TASK_SIZE-ADDR) return-einval;if (len = page_align (len)  ) = = 0) return-einval;/* Check If this memory area is ok-put it on the temporary * list if so.  The checks here was pretty simple--* every area affected in some a (by any overlap) was put * on the list. If Nothing was put on, nothing is affected.   */MPNT = Find_vma_prev (mm, addr, &prev);//try to find the first range with an end address above addr if (!MPNT) return 0;/* we have addr < Mpnt->vm_end */if (Mpnt->vm_start >= Addr+len)//If the starting address of the interval is also higher than Addr_len, the description falls into the void in return 0;/* if we'll make "hole", check the VM AR EAS limit */if ((Mpnt->vm_start < addr && mpnt->vm_end > Addr+len) && mm->map_count > = Max_map_count) return-enomem;/* * We may need one additional VMA to fix up the mappings ... * and this was the last Chan Ce for a easyError exit. */extra = Kmem_cache_alloc (Vm_area_cachep, Slab_kernel); if (!extra) return-enomem;npp = (prev? &prev->vm_next: & AMP;MM-&GT;MMAP); free = Null;spin_lock (&mm->page_table_lock), for (Mpnt && Mpnt->vm_start < addr +len; MPNT = *npp) {*npp = Mpnt->vm_next;mpnt->vm_next = Free;free = Mpnt;if (MM-&GT;MMAP_AVL) avl_remove (MPNT, &AMP;MM-&G T;MMAP_AVL);} Mm->mmap_cache = null;/* Kill the cache. */spin_unlock (&mm->page_table_lock);/* Ok-we has the memory areas we should free on the "free" list, * so Releas e them, and unmap the page range.  * If the one of the segments is only being partially unmapped, * It'll put new vm_area_struct (s) into the address space. * In the case we had to being careful with vm_denywrite. */while ((mpnt = free)! = NULL) {Unsigned long St, end, size;struct file *file = Null;free = free->vm_next;st = addr &L T Mpnt->vm_start? Mpnt->vm_start:addr;end = Addr+len;end = end > Mpnt->vm_end? Mpnt->vM_end:end;size = End-st;if (Mpnt->vm_flags & Vm_denywrite && (st! = Mpnt->vm_start | | end! = MPNT ->vm_end) && (file = mpnt->vm_file)! = NULL) {Atomic_dec (&file->f_dentry->d_inode->i_ Writecount);} Remove_shared_vm_struct (MPNT); Mm->map_count--;flush_cache_range (mm, St, end); Zap_page_range (mm, St, size);// Release the mappings for several consecutive pages flush_tlb_range (mm, St, end);/* Fix the mapping, and free the old area if it wasn ' t reused. */extra = Unmap_fixup (mm, MPNT, St, size, extra); if (file) Atomic_inc (&file->f_dentry->d_inode->i_ Writecount);} /* Release The extra VMA struct if it wasn ' t used */if (extra) kmem_cache_free (Vm_area_cachep, extra); Free_pgtables (mm, Pre V, addr, addr+len);//Release the page that is already empty for page return 0;}


Here we mainly analyze, Zap_page_range, release a number of successive pages of the mapping; the other code we assume is to make Vma->vm_end point to the new boundary, not the old one.

The Zap_page_range code is as follows:

void Zap_page_range (struct mm_struct *mm, unsigned long address, unsigned long size) {pgd_t * dir;unsigned long end = Addre SS + Size;int freed = 0;dir = Pgd_offset (mm, address);/* * This is a long-lived spinlock. That ' s fine. * there ' s no contention, because the page table * Lock only protects against KSWAPD anyway, and * even if KSWAPD happened To being looking at the this * process we _want_ it to get stuck. */if (address >= end) BUG () Spin_lock (&mm->page_table_lock);d o {freed + = Zap_pmd_range (mm, dir, address, end-a ddress); address = (address + pgdir_size) & pgdir_mask;dir++;} while (address && address < end), Spin_unlock (&mm->page_table_lock);/* * Update RSS for the mm_struct ( Not necessarily current->mm) * Notice-RSS is a unsigned long. */if (Mm->rss > Freed) mm->rss-= Freed;elsemm->rss = 0;}


Zap_pmd_range, as follows:
static inline int zap_pmd_range (struct mm_struct *mm, pgd_t * dir, unsigned long address, unsigned long size) {pmd_t * PMD; unsigned long end;int freed;if (Pgd_none (*dir)) return 0;if (Pgd_bad (*dir)) {pgd_error (*dir);p gd_clear (dir); return 0;} PMD = Pmd_offset (dir, address), address &= ~pgdir_mask;end = address + size;if (end > Pgdir_size) end = Pgdir_size;fr  Eed = 0;do {freed + = Zap_pte_range (mm, PMD, address, end-address); address = (address + pmd_size) & pmd_mask; pmd++;} while (address < end); return freed;}


Zap_pte_range, the code is as follows:
static inline int zap_pte_range (struct mm_struct *mm, pmd_t * PMD, unsigned long address, unsigned long size) {pte_t * PTE; int freed;if (Pmd_none (*PMD)) return 0;if (Pmd_bad (*PMD)) {pmd_error (*PMD);p md_clear (PMD); return 0;} Pte = Pte_offset (PMD, address), address &= ~pmd_mask;if (address + size > pmd_size) size = Pmd_size-address;size &G t;>= page_shift;freed = 0;for (;;) {pte_t page;if (!size) break;page = Ptep_get_and_clear (PTE);//clear the Page table entry and return the original page table entry pte++;size--;if (Pte_none (page))// If the page table entry has nothing, continue looping continue;freed + = free_pte (page),//Release the memory page and the use of pages on the disk}return freed;}

Free_pte, the code is as follows:

static inline int free_pte (pte_t Pte) {if (Pte_present (PTE)) {//Page table entries must have content; If the page is not in memory, it must be a swap page, just call swap_free; If the page is in memory, Then, it is possible to swap the page, or the normal page struct page *page = Pte_page (PTE);//Get the page structure pointer if (! Valid_page (PAGE)) | | Pagereserved (page)) return 0;/*  * free_page () used to is able to clear swap cache * entries.  We are now having to do it manually.   */if (Pte_dirty (PTE) && page->mapping) set_page_dirty (page);//Set the page structure of Pg_dirty, and in the corresponding Address_ The space structure is moved into the dirty_pages queue Free_page_and_swap_cache (page);//Remove the use of the swap page or the normal page return 1;} Swap_free (Pte_to_swp_entry (PTE));//If the page is not in memory, call Swap_freereturn 0 only;}


Free_page_and_swap_cache, remove the use of the Exchange page or the normal page, the code is as follows:
void Free_page_and_swap_cache (struct page *page) {/  * * * If We are the only user and then try to free up the swap cache.
   
    */if (Pageswapcache (page) &&! Trylockpage (page)) {//If the interchange page is (!is_page_shared (page)) {////The current process is the last user of this page, assuming that the use count is 2delete_from_swap_cache_nolock (page);//detach from three queue}unlockpage (page);} Page_cache_release (page);//The normal page now uses a count of 1, only call this function, release the corresponding page, the Exchange page Delete_from_swap_cache_nolock after the use of the count becomes 1, and then call this function, Release the corresponding page}
   


A page data structure, at the same time in three queues, one is linked through its queue header list into a swap/swap queue, which is one of the clean_pages, Dirty_pages, and locked_pages three queues in the corresponding address_space structure The second is to link an LRU queue through its queue head, namely Active_list, Inactive_dirty_list, or one of the inactive_clean_list, and finally a hash queue through a pointer next_hash.

Delete_from_swap_cache_nolock the page out of the queue above, the code is as follows:

void Delete_from_swap_cache_nolock (struct page *page) {if (! Pagelocked (page)) BUG (), if (block_flushpage (page, 0)) Lru_cache_del (page),//lru off-chain spin_lock (&pagecache_lock); Clearpagedirty (page); __delete_from_swap_cache (page);//list,next_hash off-chain Spin_unlock (&pagecache_lock);p Age_ Cache_release (page);//Usage count minus 1, becomes 1.}

Returns Free_page_and_swap_cache, calling Page_cache_release. If the normal page is now using a count of 1, call this function only, release the corresponding page, if it is the exchange of the page, the use of the count after Delete_from_swap_cache_nolock to 1, and then call this function to release the corresponding page.

Linux kernel Source-code scenario Analysis-System call BRK ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.