Linux virtual memory management on X86

Source: Internet
Author: User
Article Title: Virtual Memory Management for Linux on X86. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
   Preface
Linux supports many hardware operating platforms, such as Intel X86, Alpha, and iSCSI. For some functions that cannot be used in general, Linux must be implemented based on the features of the hardware platform. The purpose of this article is to briefly discuss how Linux implements the virtual memory management function in the X86 protection mode. For simplicity and convenience, this article limits the following: X86 processors are 80486 and later processors. X86 works in protection mode and does not use physical memory expansion (using 32bits physical address ), do not use extended pages (the page size is 4 K ). Skip this article for content unrelated to the limitation mode. In Linux virtual memory management, content unrelated to the hardware platform is also skipped in this article. The source code version of Linux kernel cited in this article is Linux 2.2.5.
  
   Block and paging mechanisms for X86
   I. X86 segmentation mechanism and corresponding System Structure
The segmentation mechanism of X86 is to divide the linear address space of X86 into many small spaces-segments, which are used to store (record) Code and data, protection of segments is used to protect data or code. Depending on the role of each segment and the storage content, X86 divides the segment into three types of process segments (code segment, data segment, and stack segment) and two types of system segments: Task status segment (TSS, task-State Segment) and LDT segments (because GDT is not accessed by Segment descriptors and Segment selector, X86 does not think that there is a GDT Segment; likewise, there is no IDT Segment ).
In the segmentation mechanism, X86 uses the following data structures:
• GDT (Global Describtor Table): stores the segment descriptors used by the system and the segment descriptors shared by various tasks. It can be the segment descriptors of any of the above classes, the maximum table length is 64 KB;
• Local Describtor Table: stores the segment descriptors of each segment dedicated to a task. It can only be the segment descriptors and call gate descriptors of three types of process segments, the maximum table length is 4 GB;
• Segment descriptor (Segment Describtor): 64 bits, used to describe the base address of a Segment (this address is a linear address), the type of this Segment, and restrictions on the operation of this Segment;
• Gate Describtor: 64 bits, a special descriptor that provides protection for system calls or program calls or accesses at different privileged levels. It can be divided into four categories: call Gate Describtor, Interrupt Gate Describtor, Trap Gate Describtor, and Task Gate Describtor );
• Segment Selector: 16 bits, used to index the corresponding Segment descriptor in GDT or LDT;
• Interrupt description Table (IDT, Interrupt Describer Table): stores the gate descriptor, which can only be the Interrupt gate descriptor, trap gate descriptor, and task gate descriptor. The maximum Table length is 64KB;
At the same time, X86 provides the following registers for supporting the segmentation mechanism:
• Global Descriptor Table Register (GDTR, GDT Register): 48 bits, 32bits is the base address (linear address) of GDT, 16bits is the table length of GDT, and the initial value of GDTR is: base address 0, the table length is 0 xFFFF;
• Local Descriptor Table Register (LDTR, LDT Register): 80 bits, 16bits is the LDT segment selection operator, and 64bits is the segment descriptor of the LDT segment;
• Interrupt Descriptor Table Register (IDTR, IDT Register): 48 bits, 32bits is the IDT base address (linear address), 16 bits is the IDT table length, and the IDTR initial value is: base address 0, the table length is 0 xFFFF;
• Task Register (TR, Task Register): 80 bits. 16bits is the Task status segment identifier, and 64bits is the segment descriptor of the Task status segment;
• Six Segment registers: visible and hidden segments. Visible segments are Segment delimiters and hidden segments are Segment delimiters; the six Segment registers are CS, SS, DS, ES, FS, and GS. For the functions of these Segment registers, see 3.4.2 'segment Register 'in [1 ';
86 the 48bits Logical address used by the process in the protection mode ). The high 16 bits of the logical address is the segment selection character, and the low 32 bits is the offset within the segment. Index the corresponding segment descriptor (obtain the base Address of the segment) in GDT or LDT, and add the offset to obtain the Linear Address (Linear Address) corresponding to the logical Address ). If leaf management is not used, the linear Address is directly mapped to the Physical Address (Physical Address), so the linear Address can be directly used to access the memory; otherwise, paging conversion through X86 is required, converts a linear address to a physical address.
The above is a brief description of the content related to the X86 segments. For details about the data structure, registers, and the conversion of logical addresses to linear addresses, refer to [1].
  
   II. X86 paging mechanism and corresponding System Structure
The linear address space of 32bits can be directly mapped to the physical address space, or indirectly mapped to many small physical spaces (disk storage space. This indirect ing method is paging. The available page size for X86 is 4 kb, 2 MB, and 4 MB (2 MB and 4 MB can only be used in the Pentium and Pentium Pro processors, which is limited to 4 kb in this article ).
In the paging mechanism, X86 uses four data structures:
• Page Directory Entry: A 32bits structure. A height of 20 bits indicates the base address (physical address) of the Page table, in increments of 4 kb, and a low value of 12 bits indicates the Page table attribute, for details about the conversion, refer to the subsequent initialization section;
• Page directory: stores Page directory items on one Page, which can contain a total of 1024 Page directory items;
• Page Table item (PTE, Page Table Entry): 32bits structure. A height of 20 bits indicates the base Page address (physical address), and a lower value of 12 bits indicates the Page attribute;
• Page table: stores Page table items on one Page, which can contain 1024 Page table items in total;
• Page: 4 kb continuous address space;
To implement paging and improve address translation efficiency, X86 provides and uses the following hardware structure:
• Page flag (PG, Page): this flag is 1, which indicates that the Page mechanism is used. It is actually the 31bit of the control register CR0;
• Page Cache/quick table (TLBs, Translation Lookaside Buffers): stores recently used pdns and PTE to improve the efficiency of address Translation;
• Page Directory Base address Register (PDBR, Page Directory Base Register): used to store the Base address (physical address) of the Page Directory, which is actually the control Register C3;
To map linear addresses to physical addresses, X86 interprets the 32bits linear addresses as three parts: the 31bit to 22bit is the offset in the page Directory, used to index the page Directory items (get the base address of the corresponding page table); the offset from 21bit to 12bit is used to index the page table items (get the base address of the corresponding page ); the offset from 11bit to 0bit is the page offset. In this way, the physical address corresponding to the linear address can be correctly obtained through the offset between the two-level index and the page.
For detailed descriptions and functions of the paging mechanism, refer to [1].
  
   LINUX segmentation Policy
Linux uses a minimal segmentation mechanism on X86 to avoid complicated segmentation mechanisms and improve the portability of Linux hardware platforms that do not support the segmentation mechanism, at the same time, we fully utilize the block mechanism of X86 to isolate user code and kernel code. Therefore, in Linux, the logical address and linear address have the same value.
Because the maximum GDT table length of X86 is 64 KB and each segment descriptor is 8B, GDT can accommodate a maximum of 8192 segment descriptors. Each time a process is generated, Linux creates two descriptors for the process in gdt: LDT segment descriptor and TSS descriptor, except for the first 12 items reserved by Linux in GDT, GDT can accommodate up to 4090 processes. The Linux kernel has its own independent code segments and data segments. The corresponding segment descriptors are respectively stored in 2nd and 3rd items in GDT. Each process also has its own code segment and data segment, and the corresponding segment descriptor is stored in its own LDT. For the distribution of LinuxGDT table items and DLT table items, see Appendix 1 and Appendix 2.
In Linux, each user process can access a 4 GB linear address space. 0x0 ~ 0xBFFFFFFF's 3 GB space is user-state space, which can be accessed directly by user-State processes. From 0xC0000000 ~ The 1 GB space of 0x3FFFFFFF is the kernel state space, which stores the code and data accessed by the kernel. User State processes cannot be directly accessed. When a user's process accesses the kernel state space through an interrupt or system call, the X86 privileged level conversion is triggered (from privileged level 3 to privileged level 0), that is, switching from user mode to kernel mode.
  
   LINUX paging Policy
Standard Linux paging is a three-level page table structure. In addition to the page directories and pages supported by X86, a level is also called an intermediate page Directory. Therefore, when a linear address is converted to a physical address, the linear address is interpreted as four parts (not the three parts recognized by X86), and the index in the middle directory of the page is added. When running on the X86 platform, Linux defines the maximum number of page Directory items in the intermediate page directory as 1, and provides a set of related macros (these macros Replace the middle page Directory with the page Directory) to perfectly convert the three-level Page Structure decomposition process to the second-level page decomposition for X86. In this way, you do not need to change the main code explained on the page in the kernel (these codes all think that the linear address is composed of four parts ). For more information about these macro definitions, see Linux source code "/include/asm/pgtable. h", "/include/asm/page. h ".
The section of the kernel-state virtual space from 3 GB to 3 GB + 4 MB (corresponding to the page table guided by 768th items in the process page Directory) is mapped to the physical address 0x0 ~ 0x3FFFFF (4 MB ). Therefore, when a process is in the kernel state, as long as it accesses 3 GB to 3 GB + 4 MB, it can access the physical memory of 4 MB. All processes have the same linear space from 3 GB to 4 GB, and are mapped to the same physical memory segment from the same page Directory items and the same page table. In this way, Linux allows kernel-State processes to share code and data.
  
   Linux segment paging Initialization
Regardless of how the Linux system is guided, after zImage (see arch/i386/boot/bootsect. s) or after LILO, the system will jump to and execute arch/i386/boot/setup. s (loaded to SETUPSEG, physical address 0x90200), setup. s gets the hardware parameters (such as hard disk parameters) of the computer system from the BIOS, puts them in the memory parameter area (temporary sending and releasing), and performs some preliminary status checks to prepare for entering the protection mode. For detailed execution of the pilot process and setup. s, refer to [2].
The kernel initialization module in protection mode starts from the physical address 0x100000. The code and data structure starting from this address are mapped to arch/i386/kernel/head. s. See Appendix 3. The main function of the initialization module is to initialize the related registers, such as IDT, GDT, page directory, and page table. Next, we will ignore the details of the head. s execution process and briefly describe the main initialization functions of head. s.
1. partial register initialization: Use _ KERNEL_DS (0x18, include/asm-i386/segment for segment registers DS, ES, GS, and FS. h) To initialize (you can see through the description of the segment register and the introduction of the segment selector, its role is to locate the third item in GDT (kernel data segment ), and set the operation limit for this segment to 0); set the PG bit of CR0, and select the position AM, WP, NE, and MP based on the CPU model; use 0x101000 to initialize (the address of swapper_pg_dir in the page Directory). Set the high ESP 32bits to _ KERNEL_DS (0x18), and the low 32bits is init_user_stack + 8192; LDTR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.