Analysis on the mechanism of Linux reverse mapping

Source: Internet
Author: User
Tags add time

2017-05-20

The party returned as always to see the badminton game, and then think of a few days ago and friends to discuss the problem of the reverse mapping, or a brief summary under, lest forget again later! But when I add time ... This is a little awkward ... 520 is still writing a technical blog ...

Gossip not much said, before a problem is to be based on the physical page frame number to get mapped virtual address, a time do not know how to start, in the group and a friend discussed, remember the swap mechanism before the exchange cache, recorded that the system when to swap out a page, it is easy to find the use of the page of all processes, And then undo the mapping. This has become a breakthrough for me. After a study of the source code, combined with related books, we have this article today. The focus is on the reverse mapping mechanism.

As the name implies, there is a virtual address through the page to get the physical address of the process is a forward mapping, then based on the physical address derivation of virtual address it? Nature has become a reverse mapping. As we all know, each physical page under Linux corresponds to a page structure, the physical page frame number can be easily converted to the page structure, it may be useful to see how the kernel is transformed.

#define __pfn_to_page (PFN)    (Mem_map + ((PFN)-arch_pfn_offset))#define __PAGE_TO_PFN (page) ((    unsigned long) ((page)-Mem_map) + Arch_pfn_offset)

This is a bit like the Windows PFN database, Mem_map is a page pointer, as the PFN database (which is actually the beginning of a large Array), Arch_pfn_offset is the PFN of the physical start address. So the difference is actually effective PFN. It is the same idea to convert from page to PFN. So what does this have to do with reverse mapping? Here is the important page structure, the structure is very large, we only say and reverse mapping is related to the part.

There are two fields in the page structure:

structpage{structAddress_space *mapping;        Union {pgoff_t Index; /*Our offset within mapping.*/            void*freelist;/*Slub/slob First free object*/            BOOLPfmemalloc;/*If set by the page allocator, * alloc_no_watermarks is set * and the Low watermark were not * met implying, the system * is under some pres Sure.                         The * caller should try ensure * This page was only used to                         * Free other pages. */        }; struct{Union {/** Count of PTEs mapped in * MMS, to show when page is * m                     Apped & Limit Reverse Map * searches. * * used also for tail pages * refcounting instead of * _count .                     Tail pages cannot * be mapped and keeping the * Tail page _count Zero at  * All times Guarantees * Get_page_unless_zero () would * never succeed on                     Tail * pages. */atomic_t _mapcount; struct{/*Slub*/unsigned inuse: -; Unsigned objects: the; Unsigned frozen:1;                    }; intUnits/*slob*/                };        atomic_t _count; /*Usage count, see below.*/            };    };              };} 

In fact, this is what you want to say. Three fields, mapping, when mapping anonymous pages point to a ANON_VMA structure, in the Map file page point to the Inode node Address-space;index, indicating the corresponding virtual page in the VMA linear index; _ Mapcount, the number of processes that share the page; Note that the value defaults to 1 when a process is used for 0, so its value indicates how many processes are in use in addition to the current process, making it easy to undo. Knowing the three fields, the next one is a lot more. Explained by a function page_referenced.

int page_referenced (struct page *int  is_locked,struct mem_ Cgroup *long *vm_flags)

The original explanation is as follows: Quick test_and_clear_referenced for all mappings to a page,returns the number of PTEs which referenced the page. is fast and clears all references to a page (in a different page table), returning the number of PTEs that reference the page. Simply walk the process

intPage_referenced (structPage *page,intis_locked,structMem_cgroup *memcg, unsignedLong*vm_flags) {    intreferenced =0; intwe_locked =0; *vm_flags =0; if(page_mapped (page) &&page_rmapping (page)) {        if(!is_locked && (!) Pageanon (page) | |pageksm (page))) {we_locked=trylock_page (page); if(!we_locked) {Referenced++; Goto  out; }        }        if(Unlikely (PAGEKSM (page))) referenced+=page_referenced_ksm (page, memcg, vm_flags); Else if(Pageanon (page)) referenced+=Page_referenced_anon (page, memcg, vm_flags); Else if(page->mapping) referenced+=page_referenced_file (page, memcg, vm_flags); if(we_locked) unlock_page (page); if(Page_test_and_clear_young (PAGE_TO_PFN (page))) referenced++; } out:    returnreferenced;}

First check that both forward and reverse mappings exist, if the page is not locked and the page is a KSM page or file mapping page, you need to trylock, if the lock fails, then directly out. The next step is to deal with different situations. If it is KSM page go page_referenced_ksm. If it is an anonymous mapping page, go to Page_referenced_anon, if it is a file mapping page, go page_referenced_file. KSM is a kernel page sharing mechanism, mainly used in KVM, but other places can also be referenced, because it needs to calculate whether the page is the same, so in the case of low repetition rate, most of the choice to turn off KSM, about KSM in another article has been introduced.

If it is an anonymous mapping page, enter Page_referenced_anonstatic int page_referenced_anon (struct page *page,struct Mem_cgroup * MEMCG,unsigned long *vm_flags) function

Static intPage_referenced_anon (structPage *page,structMem_cgroup *memcg, unsignedLong*vm_flags) {unsignedintMapcount; structANON_VMA *ANON_VMA;    pgoff_t Pgoff; structAnon_vma_chain *AVC; intreferenced =0; ANON_VMA=page_lock_anon_vma_read (page); if(!anon_vma)returnreferenced; Mapcount=Page_mapcount (page); Pgoff= Page->index << (Page_cache_shift-page_shift); Anon_vma_interval_tree_foreach (AVC,&anon_vma->Rb_root, Pgoff, Pgoff) {        structVm_area_struct *VMA = avc->VMA; unsignedLongAddress =vma_address (page, VMA); /** If We is reclaiming on behalf of a cgroup, skip * Counting on behalf of references from different * Cgroups*/        if(memcg &&!mm_match_cgroup (vma->vm_mm, MEMCG)) Continue; Referenced+=page_referenced_one (page, VMA, address,&Mapcount, vm_flags); if(!mapcount) Break;    } page_unlock_anon_vma_read (ANON_VMA); returnreferenced;}

To undo the mapping, it is important to locate the specific PTE, and PTEs can only be obtained from the Virtual Address Lookup page table, so it is imperative to find the virtual address and the page table. Here we first get the page corresponding to the ANON_VMA, mentioned earlier, in the case of anonymous mapping, page->mapping points to the ANON_VMA structure. Then get the page's shared count Mapcount, get the page corresponding to the virtual page box in the VMA corresponding to the linear index, then began to traverse Interval-tree. Each anon_vma_chain is associated with the VMA of a process, and through vma_address (page, VMA) the virtual address of the process corresponding to the current VMA can be obtained. Ignore Cgroup related content for the moment. Next, call Page_referenced_one to de-map. Previously mentioned, there is already a virtual address, with the VMA, according to VMA can get corresponding mm_struct, and then get the page base, OK, the process has gone through. This function is not listed, there are two cases in the function, if it is a large page (2M page), need to obtain is PMD; if it is a normal page, you need to acquire PTE; then check the _page_accessed bit. If it is set, then the + + reference counter is cleared, otherwise, it does not change. So often visit the page, reference counter high, it is more easily defined as active page, resident active LRU linked list, it is not easy to be swapped out.

Looking back at the initial problem, the virtual address is found through the physical address, and after acquiring the VMA and index, a function solves the problem

Static Long __vma_address (structstruct vm_area_struct *VMA) {    = Page->index << (Page_cache_shift- page_shift);     if (Unlikely (Is_vm_hugetlb_page (VMA)))         = Page->index << huge_page_order (page_hstate (page));     return Vma->vm_start + ((pgoff-vma->vm_pgoff) << page_shift);}

The code here does not need to explain more, about the structure of the ANON_VMA organization, after the analysis of space;

Thank the Lord!

Reference:

Linux 3.10.1 Source code

"Deep Linux kernel Architecture"

Analysis on the mechanism of Linux reverse mapping

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.