Linux Memory Source Analysis-initialization of the page table __linux

Source: Internet
Author: User
Tags parent directory
Linux Memory Source analysis-initialization of page tables

This article is original, reprint please specify: http://www.cnblogs.com/tolimit/p/4585803.html

In this article, we assume that the 32-bit system under x86 is not to analyze the page table structure of the 64-bit system.

Linux Paging

Linux uses four levels of paging, a linear address will be divided into 5 offsets for addressing, specifically look at the picture:

Although there are four levels, but not every level will be used, in Linux, different hardware systems may use a Level two page table, a three-level page table, one of the four-level page tables, as follows: 64-bit systems: use four-level paging or three-level pagination to be related to hardware. 32-bit system with no PAE (Physical Address Extension): Only use Level two paging, page parent directory and page Middle directory value is 0. Open PAE 32-bit system: Use level three paging, in which case is excluded is the page superior directory, that is, the page superior directory all values are 0.

There is a CR3 in the diagram, which is a register that is dedicated to saving the base address of the page global catalog, and the kernel's master Kernel page global catalog is stored in the SWAPPER_PG_DIR global variable, but the system will put the value of the variable into the CR3 register when it needs to use the Master Kernel page table. The process's own page global catalog site is stored in the PGD of its own process descriptor, and when the process switches, the process's page table also needs to be switched, which is to deposit the PGD of the new process's process descriptor into CR3. Each of these directories and page tables is saved with a single page box. For example, a process has a page global catalog, 1024 pages in the middle directory, 1024 page tables, the system to allocate 1 page boxes for this process to save the page global catalog, 1024 page boxes to save the page in the Middle directory, 1024 page boxes are used to save page tables. Of course, processes typically do not require so many pages of intermediate directories and page tables.

Table Entry

Actually page global catalog, page ancestor directory, page Middle directory, page table are saved in a page box, we know that in general, the page box size is 4K (special case has 2MB, 1GB), that is, the layout of the page box is 4 K multiple address to arrange, to addressing a page box, Just a 20-bit address is enough. The table entries are saved in these directories and page tables, and the page global catalog holds the page global catalog entries. The middle of the page store is the middle of the page directory entries, in 32-bit system These are 32-bit (20-bit is the base address of the page frame, 12-bit is the flag bit), after the PAE will turn to 64-bit, These items hold a lot of flags, we list several important: Present logo: 1, the page in memory, 0, not. The page frame base address: 20 digits. Accessed flag: Set whenever the paging unit addresses the corresponding page box. Dirty flag: For page table entries only, set each time a page box is written. Read/write flag: Read and Write permission flags. User/supervisor flag: The privileged level of the page you are referring to (whether the process is accessible). Page size flag: 1 refers to a 2MB or 4MB page box. That is, the page table is 2MB or 4MB.

In these, perhaps the most important is the Reference page box base address, a page in the middle of the list of items saved the base address of the corresponding page table, and the page table entries saved in the Page box base address, is the page (used to save data) address. And the present flag is used to determine whether the occurrence of a page fault exception handling signs. Because these flags plus the reference to the page box base address altogether 32 bits, a 4K page box will be able to save 1024 table entries.

Physical Address Extension (PAE)

This technique is used for x86_32 bit systems, because 32-bit linear addresses can represent up to 4GB of space, while PAE technology expands the physical address line to 36, where the CPU can address the 64GB size of physical memory. But the physical address line expands to 36, but the linear address is still 32 bits, and there is no way to represent the 64GB of physical memory with a 32-bit linear address. What PAE does is actually let the kernel have more than one "Master Kernel page Global Catalog", the first Master Kernel page Global directory addresses the address of the 0~4GB, and the second addresses the address of the 5~8GB, so when addressing the addresses of different areas, only the different "Master Kernel page global Catalog" base addresses are stored in CR3. These multiple master Kernel page global catalogs are called page Catalog pointer tables (PDPT).

When PAE is turned on, the 32-bit system addressing mode changes dramatically: The Level two page will become the Level three page table entry size also changed from 32 bits to 64 bits (originally 12-bit logo + 20-bit page box base address, to 12-bit flag + 24-bit page box base address, why 24-bit, Because 64GB requires 24 bits to address all page frames. The page box size will optionally be 4K or 2MB, and you can specify the size of the page by modifying the table item's page size flag. The linear address representation also becomes as follows:

Kernel zone memory layout after kernel boot

Typically, the kernel boot is loaded into the memory at the beginning of 1MB, while the normal configured kernel size is generally less than 3MB, that is, where the kernel mirrors the memory 1mb~4mb loaded, and why the 0MB~1MB memory kernel is not used, Because this memory is typically used by the BIOS and does some hardware mapping. The following figure:

What we should note in it is _end, which indicates the end of the kernel mirroring in memory in the code, initialization of the page table initializes the area not used by the kernel, and finally initializes the area used by the kernel.

high-end memory layouts

Previous article Linux memory management source analysis-the page box allocator has a simple description of the high-end memory area, in the kernel of the virtual address space in the high-end memory area is divided into three areas, namely: noncontiguous memory area, permanent kernel mapping area, fixed map area. Non-contiguous memory area is a one-time preparation for system hardware interrupt processing and kernel module production space. The permanent mapping area is prepared for the system's underlying space partitions and hardware and drivers. The fixed mapping area is prepared for user Configuration and application software to run to provide free space.

In the diagram, High_memory is the starting address of the high memory area (ZONE_HIGHMEM), Vmalloc is a noncontiguous memory area.

In the kernel, the permanent kernel mapping area and the fixed map size are generally 4MB, that is, a page table can be used to include the range of addresses it contains, and others are used for noncontiguous memory areas. However, if the physical memory size is less than 896MB, the kernel does not generate high-end memory areas, there will be only ZONE_DMA and zone_normal two areas.

We know that the kernel can use the linear address is only 1G size (0xc0000000 ~ 0xFFFFFFFF), and for the ZONE_DMA and zone_normal these two areas of the mapping has been spent 896MB linear address space, Finally, only 128MB to map high-end memory, if the memory is greater than 1G, such as 2G, the high memory size of 1152MB, the 128MB size of the linear address space is not directly mapped high-end memory, so for high-end memory processing, Linux does not directly map, Instead, the mapping is done when it is needed, the mapping is released when it is not needed, and the linear address is recycled.

When the page table is initialized, the permanent kernel mapping area and the fixed mapping area are initialized separately, but they are not mapped and are allocated only when they are needed.

Temporary kernel page table

The Temporary Kernel page global catalog was statically initialized during kernel compilation. The temporary page table is initialized by the STARTUP_32 () assembly function, which is dedicated to the system startup phase, the first page table used by the system, which allows the system to address only the physical memory of the 0~8MB segment. It will then be replaced with a fully initialized page table. The primary task of this temporary page table is to enable the system to address the top 8MB of memory in real mode (without paging) and protection mode (open paging). That is, the linear address of the address 0x00000000 to the 0x007fffff interval and the linear address 0xc0000000 to the 0XC07FFFFF interval are mapped to the physical address 0x00000000 to 0X007FFFFF. In fact, the practice is also very simple, that is, the temporary kernel page of the global catalog of 0x0, 0x1, 0x301, 0x302 the initialization of the good on the line. Why are these items, we simply explain, in real mode, that is, without opening paging, the linear address 0x00000000 corresponding physical address is 0x00000000, The linear address of the 0x00000000 to 0x007fffff is included in the 0x0 and 0x1 entries of the page global catalog. Similarly, 0xc0000000 to 0xc07fffff the page global catalog entries obtained through the mask are 0x301 and 0x302.

Source Code

Before reading the source code, we must explain some global variables: swapper_pg_dir: Master Kernel page Global directory pointer, CR3 register to save the kernel page Global directory address is from this variable. MAX_PFN: The last page box number in physical memory. MAX_LOW_PFN: The last page box number in the low-end memory.

For the initialization of the page table, the kernel has a priority order, low memory (the first 1MB in physical memory)-> low memory (the kernel is not in use)-> low memory (kernel use part)-> high-end memory (fixed map area)-> high-end memory (permanent kernel mapping area).

First, page table initialization for low-end memory areas and initialization of high memory fixed-map page tables are concentrated in the init_mem_mapping (void) function, which is in Start_kernel ()-> setup_arch ():

 1 void __init init_mem_mapping (void) 2 {3 unsigned long end;      4 5/* Set the PAGE_SIZE_MASK global variable, which determines how many page frame sizes are in the system (4K,2M,1G)/6 * * 1G size page frame only exists in 64-bit System 7 * 4K size of the page box is a normal page box 8
* 2M Size of the page box is 32-bit kernel open PAE can choose the page size of 2M 9 probe_page_size_mask (); Each/* MAX_PFN and MAX_LOW_PFN are provided by the BIOS to obtain/#ifdef CONFIG_X86_64 14 15/* 64-bit no high-end memory area * * * * * max_p
fn << Page_shift;
#else end = Max_low_pfn << page_shift;  The maximum page frame number of #endif/* End for low-end memory (ZONE_MDA and Zone_normal)/* The ISA range is always mapped regardless of Memory Holes * * 0 ~ 1MB, the general kernel is installed at the start of the 1MB at the beginning of 25 * here first initialize 0 ~ 1MB Physical Address of the init_memory_mapping (0
, isa_end_address); Memblock_bottom_up () {31/* The end address of the memory used in the kernel boot phase, which normally uses the physical memory 1MB ~ 4MB area when the kernel starts up * * unsign
ed Long kernel_end = __pa_symbol (_end);      33 34/* First map kernel end address ~ Zone_normal End address This physical address area, if 64 bits, is initialized directly to the final memory page frame because 64 bits do not have a high memory area/35   MEMORY_MAP_BOTTOM_UP (kernel_end, end);
36/* Map 1MB ~ Kernel End address this physical address area * * MEMORY_MAP_BOTTOM_UP (isa_end_address, kernel_end); /else {Memory_map_top_down (isa_end_address, end); #ifdef config_x86_64 (max_ PFN > Max_low_pfn) {mb/* can we preseve MAX_LOW_PFN?/max_low_pfn = #else 4
8/* Initialization of the fixed mapping area of the high memory area, only initialized the page middle directory Items and page table, the page table item did not initialize/Early_ioremap_page_table_range_init ();
#endif 51/* The Initialized Kernel page Global directory address is written to the CR3 Register/LOAD_CR3 (SWAPPER_PG_DIR);
53/* Refresh TLB, every time you modify the page table need to refresh, interested can check why * * __flush_tlb_all ();
55 56/* Check if there is a problem * * early_memtest (0, max_pfn_mapped << page_shift); 58}

In the annotation of this function it is very clear that we first look at the init_memory_mapping ().

init_memory_mapping (0, isa_end_address)

1/* The kernel maps the physical address of the start ~ End to a linear address, which maps only the lower-end memory areas (ZONE_DMA and Zone_normal), and the linear address 0xc0000000 corresponding Physical address is 0x00000000 */
 2 unsigned long __init_refok init_memory_mapping (unsigned long start,
 3                            unsigned long end)
 4 {
 5/     *
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.