Linux memory management-Basic Concepts

Source: Internet
Author: User
Document directory
  • 2.1 page Directory (PGD and PMD)
  • 2.2 page table entry
  • 2.3 How to access physical memory through a 3-level page table
1. Linux physical memory Three-Level Architecture

For memory management, linux adopts a design model unrelated to the specific architecture to achieve good scalability. It is mainly composed of three levels of architecture: Memory node, memory zone, and physical page.

• Memory Node

The memory node is a method of describing the physical memory in the computer system. A bus master device accesses any memory unit located in the same node at the same cost, however, Accessing memory units on any two different nodes takes different costs. There is only one node in the uniform memory architecture (UMA) computer system, and multiple nodes in the NUMA computer system. In the Linux kernel, the data structure pg_data_t is used to represent the memory node. For example, the commonly used ARM architecture is the UMA architecture.

• Memory Zone

The memory area is located within the same memory node. For various reasons, their usage and usage are not the same. For example, in a personal computer system based on the ia32 architecture, for historical reasons, the ISA Device can only use a minimum of 16 MB for DMA transmission. Another example is the Linux Kernel

 

• Physical page

 

2. Linux virtual memory three-level page table

Linux virtual memory three-level management reasons:

• PGD: Page Global Directory (page Directory)

• PMD: Page middle directory (page Directory)

• PTE: page table entry)

Each level has the following three key description macros:

• Shift

• Size

• Mask

For example, the corresponding description of the page is:

/* PAGE_SHIFT determines the page size  asm/page.h */#define PAGE_SHIFT12#define PAGE_SIZE(_AC(1,UL) << PAGE_SHIFT)#define PAGE_MASK(~(PAGE_SIZE-1))

The data structure is defined as follows:

/* asm/page.h */typedef unsigned long pteval_t;typedef pteval_t pte_t;typedef unsigned long pmd_t;typedef unsigned long pgd_t[2];typedef unsigned long pgprot_t;#define pte_val(x)      (x)#define pmd_val(x)      (x)#define pgd_val(x)((x)[0])#define pgprot_val(x)   (x)#define __pte(x)        (x)#define __pmd(x)        (x)#define __pgprot(x)     (x)

 

2.1 page Directory (PGD and PMD)

Each process has its own PGD (page Global Directory), which is a physical page and contains a pgd_t array. For the definition, see <ASM/page. h>. For the pgd_t data of the process, see task_struct-> mm_struct-> pgd_t * PGD;

The PGD and PMD of ARM architecture are defined as follows <ARCH/ARM/include/ASM/pgtable. h>:

# Define ptrs_per_pte 512 // Number of pointers that can be included in the PTE <u32> (21-12 = 9bit) # define ptrs_per_pmd 1 # define ptrs_per_pgd 2048 // Number of pointers contained in PGD <u32> (32-21 = 11bit)

# Define partition (ptrs_per_pte) # define pte_hwtable_off (pte_hwtable_ptrs * sizeof (pte_t) # define pte_hwtable_size (ptrs_per_pte * sizeof (u32 ))

/** Pmd_shift determines the size of the area a second-level page table can map * pgdir_shift determines what a third-level page table entry can map */# define pmd_shift 21 # define pgdir_shift 21

Virtual Address shift macro chart:

Virtual Address Mask and size macro:

 

 

2.2 page table entry

PTES, PMDS, and PGDS are described by pte_t, pmd_t, and pgd_t respectively. For storage protection, pgprot_t is defined. It has related flags and is often stored in the page table entry Low Position (lower bits). Its specific storage method depends on the CPU architecture.

Each pte_t points to the address of a physical page, and all addresses are page aligned. Therefore, the page_shift (12) bit in the 32-bit address is idle and can be the status bit of the Pte.

Shows the protection and status bits of PTE:

2.3 How to access physical memory through a 3-level page table

To access physical memory through PGD, PMD, and PTE, the related macros are defined in ASM/pgtable. h.

• Pgd_offset

The macro for getting PGD items based on the current virtual address and mm_struct of the current process is defined as follows:

/* To find an entry in a page-table-directory */# define pgd_index (ADDR)> pgdir_shift) // obtain the index in the PGD table # define pgd_offset (mm, ADDR) (mm)-> PGD + pgd_index (ADDR )) // obtain the start address of the PMD table/* To find an entry in a kernel page-table-directory */# define pgd_offset_k (ADDR) pgd_offset (& init_mm, ADDR)

• Pmd_offset
Based on the PGD items and virtual addresses obtained through pgd_offset, obtain the relevant PMD items (that is, the start address of the PTE table)

/* Find an entry in the second-level page table .. */# define pmd_offset (Dir, ADDR) (pmd_t *) (DIR) // It is the value of the PGD item.

• Pte_offset

Obtain the relevant PTE items (that is, the starting address of the physical page) based on the information obtained by pmd_offset and the virtual address)

#ifndef CONFIG_HIGHPTE#define __pte_map(pmd)pmd_page_vaddr(*(pmd))#define __pte_unmap(pte)do { } while (0)#else#define __pte_map(pmd)(pte_t *)kmap_atomic(pmd_page(*(pmd)))#define __pte_unmap(pte)kunmap_atomic(pte)#endif#define pte_index(addr)(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))#define pte_offset_kernel(pmd,addr)(pmd_page_vaddr(*(pmd)) + pte_index(addr))#define pte_offset_map(pmd,addr)(__pte_map(pmd) + pte_index(addr))#define pte_unmap(pte)__pte_unmap(pte)#define pte_pfn(pte)(pte_val(pte) >> PAGE_SHIFT)#define pfn_pte(pfn,prot)__pte(__pfn_to_phys(pfn) | pgprot_val(prot))#define pte_page(pte)pfn_to_page(pte_pfn(pte))#define mk_pte(page,prot)pfn_pte(page_to_pfn(page), prot)#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)#define pte_clear(mm,addr,ptep)set_pte_ext(ptep, __pte(0), 0)

As shown in:

 
2.4 sample code for obtaining a physical page based on a virtual address

For the sample code for obtaining the physical page based on the virtual address, see <mm/memory. C's function follow_page>.

 

/** * follow_page - look up a page descriptor from a user-virtual address * @vma: vm_area_struct mapping @address * @address: virtual address to look up * @flags: flags modifying lookup behaviour * * @flags can have FOLL_ flags set, defined in <linux/mm.h> * * Returns the mapped (struct page *), %NULL if no mapping exists, or * an error pointer if there is a mapping to something not represented * by a page descriptor (see also vm_normal_page()). */struct page *follow_page(struct vm_area_struct *vma, unsigned long address,unsigned int flags){pgd_t *pgd;pud_t *pud;pmd_t *pmd;pte_t *ptep, pte;spinlock_t *ptl;struct page *page;struct mm_struct *mm = vma->vm_mm;page = follow_huge_addr(mm, address, flags & FOLL_WRITE);if (!IS_ERR(page)) {BUG_ON(flags & FOLL_GET);goto out;}page = NULL;pgd = pgd_offset(mm, address);if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))goto no_page_table;pud = pud_offset(pgd, address);if (pud_none(*pud))goto no_page_table;if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {BUG_ON(flags & FOLL_GET);page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);goto out;}if (unlikely(pud_bad(*pud)))goto no_page_table;pmd = pmd_offset(pud, address);if (pmd_none(*pmd))goto no_page_table;if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {BUG_ON(flags & FOLL_GET);page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);goto out;}if (pmd_trans_huge(*pmd)) {if (flags & FOLL_SPLIT) {split_huge_page_pmd(mm, pmd);goto split_fallthrough;}spin_lock(&mm->page_table_lock);if (likely(pmd_trans_huge(*pmd))) {if (unlikely(pmd_trans_splitting(*pmd))) {spin_unlock(&mm->page_table_lock);wait_split_huge_page(vma->anon_vma, pmd);} else {page = follow_trans_huge_pmd(mm, address,     pmd, flags);spin_unlock(&mm->page_table_lock);goto out;}} elsespin_unlock(&mm->page_table_lock);/* fall through */}split_fallthrough:if (unlikely(pmd_bad(*pmd)))goto no_page_table;ptep = pte_offset_map_lock(mm, pmd, address, &ptl);pte = *ptep;if (!pte_present(pte))goto no_page;if ((flags & FOLL_WRITE) && !pte_write(pte))goto unlock;page = vm_normal_page(vma, address, pte);if (unlikely(!page)) {if ((flags & FOLL_DUMP) ||    !is_zero_pfn(pte_pfn(pte)))goto bad_page;page = pte_page(pte);}if (flags & FOLL_GET)get_page(page);if (flags & FOLL_TOUCH) {if ((flags & FOLL_WRITE) &&    !pte_dirty(pte) && !PageDirty(page))set_page_dirty(page);/* * pte_mkyoung() would be more correct here, but atomic care * is needed to avoid losing the dirty bit: it is easier to use * mark_page_accessed(). */mark_page_accessed(page);}if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {/* * The preliminary mapping check is mainly to avoid the * pointless overhead of lock_page on the ZERO_PAGE * which might bounce very badly if there is contention. * * If the page is already locked, we don't need to * handle it now - vmscan will handle it later if and * when it attempts to reclaim the page. */if (page->mapping && trylock_page(page)) {lru_add_drain();  /* push cached pages to LRU *//* * Because we lock page here and migration is * blocked by the pte's page reference, we need * only check for file-cache page truncation. */if (page->mapping)mlock_vma_page(page);unlock_page(page);}}unlock:pte_unmap_unlock(ptep, ptl);out:return page;bad_page:pte_unmap_unlock(ptep, ptl);return ERR_PTR(-EFAULT);no_page:pte_unmap_unlock(ptep, ptl);if (!pte_none(pte))return page;no_page_table:/* * When core dumping an enormous anonymous area that nobody * has touched so far, we don't want to allocate unnecessary pages or * page tables.  Return error instead of NULL to skip handle_mm_fault, * then get_dump_page() will return NULL to leave a hole in the dump. * But we can only make this optimization where a hole would surely * be zero-filled if handle_mm_fault() actually did handle it. */if ((flags & FOLL_DUMP) &&    (!vma->vm_ops || !vma->vm_ops->fault))return ERR_PTR(-EFAULT);return page;}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.