Linux virtual memory management on x86

Source: Internet
Author: User

Linux virtual memory management on x86

Author: Zhou mengxing 06:02:00 from: http://www.china-pub.com

Preface
Linux supports many hardware operating platforms, such as Intel x86, Alpha, and iSCSI. For some functions that cannot be used in general, Linux must be implemented based on the features of the hardware platform. The purpose of this article is to briefly discuss how Linux implements the virtual memory management function in the x86 protection mode. For simplicity and convenience, this article limits the following: x86 processors are 80486 and later processors. x86 works in protection mode and does not use physical memory expansion (using 32bits physical address ), do not use extended pages (the page size is 4 K ). Skip this article for content unrelated to the limitation mode. In Linux virtual memory management, content unrelated to the hardware platform is also skipped in this article. The source code version of Linux kernel cited in this article is Linux 2.2.5.

Block and paging mechanisms for x86

I. x86 segmentation mechanism and corresponding System Structure
The segmentation mechanism of X86 is to divide the linear address space of X86 into many small spaces-segments, which are used to store (record) Code and data, protection of segments is used to protect data or code. Depending on the role of each segment and the storage content, x86 divides the segment into three types of process segments (code segment, data segment, and stack segment) and two types of system segments: Task status segment (TSS, task-state segment) and LDT segments (because gdt is not accessed by segment descriptors and segment selector, x86 does not think that there is a gdt segment; likewise, there is no IDT segment ).
In the segmentation mechanism, x86 uses the following data structures:
· Global describtor table: stores the segment descriptors used by the system and the segment descriptors shared by various tasks. It can be the segment descriptors of any of the above classes, the maximum table length is 64 KB;
· Local describtor table: stores the segment descriptors of each segment dedicated to a task. It can only be the segment descriptors and call gate descriptors of three types of process segments, the maximum table length is 4 GB;
· Segment descriptor (segment describtor): 64 bits, used to describe the base address of a segment (this address is a linear address), the type of this segment, and restrictions on the operation of this segment;
· Gate describtor (Gate describtor): 64 bits, a special descriptor that provides protection for system calls or program calls or accesses at different privileged levels. It is divided into four categories: call gate describtor, interrupt gate describtor, trap gate describtor, and task gate describtor );
· Segment selector: 16 bits, used to index the corresponding segment descriptor in gdt or LDT;
· Interrupt description table (IDT, interrupt Describer table): stores the gate descriptor, which can only be the interrupt gate descriptor, trap gate descriptor, and task gate descriptor. The maximum table length is 64kb;
At the same time, x86 provides the following registers for supporting the segmentation mechanism:
· Global Descriptor Table register (GDTR, gdt register): 48 bits, 32bits is the base address (linear address) of gdt, 16bits is the table length of gdt, and the initial value of GDTR is: base address 0, the table length is 0 xFFFF;
· Local Descriptor Table register (ldtr, LDT register): 80 bits, 16bits is the LDT segment selection operator, and 64bits is the segment descriptor of the LDT segment;
· Interrupt Descriptor Table register (idtr, IDT register): 48 bits, 32bits is the IDT base address (linear address), 16bits is the IDT table length, and idtr's initial value is: base address 0, the table length is 0 xFFFF;
· Task register (TR, task register): 80 bits. 16bits is the task status segment identifier, and 64bits is the segment descriptor of the task status segment;
· Six segment registers (segment register): divided into visible and hidden segments. Visible segments are segment delimiters and hidden segments are segment delimiters; the six segment registers are CS, SS, DS, es, FS, and Gs. For the functions of these segment registers, see 3.4.2 'segment register 'in [1 ';
86 the 48bits logical address used by the process in the protection mode ). The high 16 bits of the logical address is the segment selection character, and the low 32 bits is the offset within the segment. Index the corresponding segment descriptor (obtain the base address of the segment) in gdt or LDT, and add the offset to obtain the linear address (linear address) corresponding to the logical address ). If leaf management is not used, the linear address is directly mapped to the physical address (physical address), so the linear address can be directly used to access the memory; otherwise, paging conversion through x86 is required, converts a linear address to a physical address.
The above is a brief description of the content related to the x86 segments. For details about the data structure, registers, and the conversion of logical addresses to linear addresses, refer to [1].

Ii. x86 paging mechanism and corresponding System Structure
The linear address space of 32bits can be directly mapped to the physical address space, or indirectly mapped to many small physical spaces (disk storage space. This indirect ing method is paging. The available page size for x86 is 4 kb, 2 MB, and 4 MB (2 MB and 4 MB can only be used in the Pentium and Pentium Pro processors, which is limited to 4 kb in this article ).
In the paging mechanism, x86 uses four data structures:
· Page Directory Entry: 32bits structure. A height of 20 bits indicates the base address (physical address) of the page table, in increments of 4 kb, and a low value of 12 bits indicates the page table attribute, for details about the conversion, refer to the subsequent initialization section;
· Page Directory: stores page Directory items on one page, which can contain a total of 1024 page Directory items;
· Page table item (PTE, page table entry): 32bits structure. A height of 20 bits indicates the base page address (physical address), and a low of 12 bits indicates the page attribute;
· Page table: stores page table items on one page, which can contain 1024 page table items in total;
· Page: 4 kb continuous address space;
To implement paging and improve address translation efficiency, x86 provides and uses the following hardware structure:
· Page flag (PG, page): this flag is 1, which indicates that the page mechanism is used. Actually, it is the 31bit of the control register Cr0;
· Page Cache/quick table (tlbs, translation lookaside buffers): stores recently used pdns and Pte to improve address conversion efficiency;
· Page Directory base address register (pdbr, page Directory base register): used to store the base address (physical address) of the page Directory. Actually, it is the control register F3;
To map linear addresses to physical addresses, x86 interprets the 32bits linear addresses as three parts: the 31bit to 22bit is the offset in the page Directory, used to index the page Directory items (get the base address of the corresponding page table); the offset from 21bit to 12bit is used to index the page table items (get the base address of the corresponding page ); the offset from 11bit to 0bit is the page offset. In this way, the physical address corresponding to the linear address can be correctly obtained through the offset between the two-level index and the page.
For detailed descriptions and functions of the paging mechanism, refer to [1].

Linux segmentation Policy

Linux uses a minimal segmentation mechanism on x86 to avoid complicated segmentation mechanisms and improve the portability of Linux hardware platforms that do not support the segmentation mechanism, at the same time, we fully utilize the block mechanism of X86 to isolate user code and kernel code. Therefore, in Linux, the logical address and linear address have the same value.
Because the maximum gdt table length of X86 is 64 KB and each segment descriptor is 8B, gdt can accommodate a maximum of 8192 segment descriptors. Each time a process is generated, Linux creates two descriptors for the process in gdt: LDT segment descriptor and TSS descriptor, except for the first 12 items reserved by Linux in gdt, gdt can accommodate up to 4090 processes. The Linux kernel has its own independent code segments and data segments. The corresponding segment descriptors are respectively stored in 2nd and 3rd items in gdt. Each process also has its own code segment and data segment, and the corresponding segment descriptor is stored in its own LDT. For the distribution of linuxgdt table items and DLT table items, see Appendix 1 and Appendix 2.
In Linux, each user process can access a 4 GB linear address space. 0x0 ~ 0xbfffffff's 3 GB space is user-state space, which can be accessed directly by user-State processes. From 0xc0000000 ~ The 1 GB space of 0x3fffffff is the kernel state space, which stores the code and data accessed by the kernel. User State processes cannot be directly accessed. When a user's process accesses the kernel state space through an interrupt or system call, the x86 privileged level conversion is triggered (from privileged level 3 to privileged level 0), that is, switching from user mode to kernel mode.

Linux paging Policy

Standard Linux paging is a three-level page table structure. In addition to the page directories and pages supported by x86, a level is also called an intermediate page Directory. Therefore, when a linear address is converted to a physical address, the linear address is interpreted as four parts (not the three parts recognized by x86), and the index in the middle directory of the page is added. When running on the X86 platform, Linux defines the maximum number of page Directory items in the intermediate page directory as 1, and provides a set of related macros (these macros Replace the middle page Directory with the page Directory) to perfectly convert the three-level Page Structure decomposition process to the second-level page decomposition for x86. In this way, you do not need to change the main code explained on the page in the kernel (these codes all think that the linear address is composed of four parts ). For more information about these macro definitions, see Linux source code "/include/ASM/pgtable. H", "/include/ASM/page. H ".
The section of the kernel-state virtual space from 3 GB to 3 GB + 4 MB (corresponding to the page table guided by 768th items in the process page Directory) is mapped to the physical address 0x0 ~ 0x3fffff (4 MB ). Therefore, when a process is in the kernel state, as long as it accesses 3 GB to 3 GB + 4 MB, it can access the physical memory of 4 MB. All processes have the same linear space from 3 GB to 4 GB, and are mapped to the same physical memory segment from the same page Directory items and the same page table. In this way, Linux allows kernel-State processes to share code and data.

Linux segment paging Initialization

Regardless of how the Linux system is guided, after zimage (see ARCH/i386/boot/bootsect. s) or after Lilo, the system will jump to and execute ARCH/i386/boot/setup. S (loaded to setupseg, physical address 0x90200), setup. s gets the hardware parameters (such as hard disk parameters) of the computer system from the bios, puts them in the memory parameter area (temporary sending and releasing), and performs some preliminary status checks to prepare for entering the protection mode. For detailed execution of the pilot process and setup. S, refer to [2].
The kernel initialization module in protection mode starts from the physical address 0x100000. The code and data structure starting from this address are mapped to arch/i386/kernel/head. S. See Appendix 3. The main function of the initialization module is to initialize the related registers, such as IDT, gdt, page directory, and page table. Next, we will ignore the details of the head. s execution process and briefly describe the main initialization functions of head. S.
1. partial register initialization: Use _ kernel_ds (0x18, include/asm-i386/segment for segment registers ds, es, GS, and FS. h) To initialize (you can see through the description of the segment register and the introduction of the segment selector, its role is to locate the third item in gdt (kernel data segment ), and set the operation limit for this segment to 0); set the PG bit of Cr0, and select the position am, WP, NE, and MP based on the CPU model; use 0x101000 to initialize Cr 3 (address of swapper_pg_dir in the page Directory), set ESP high 32bits to _ kernel_ds (0x18), low 32bits to init_user_stack + 8192, and ldtr to 0.
2. initialization of IDT: This is only temporary IDT initialization. Further operations are performed in start_kernel. The variable used to represent IDT (idt_table []) is in arch/i386/kenel/traps. the variable type (desc_struct) is defined in include/asm-i386/DESC. h. IDT has a total of idt_entries (256) interrupt descriptors. The attribute characters are 0x8e00, and each Interrupt Descriptor points to the same interrupt service program ignore_init. Ignore_int only outputs the message int_msg ("unknown interrupt "). The idtr value is implemented by running the LIDT idt_descr command. By checking the idt_descr value in head. S, we can know that the IDT base address is the idt_table address, and the table length is idt_entries * 8-1 (0x7ff ).
3. Initialization of gdt: gdt has a total of gdt_entries segment descriptors. The formula for calculating gdt_entries is 12 + 2 * nr_tasks. Here, 12 indicates the 12 items reserved by Linux in gdt. nr_tasks (512) indicates the number of processes that the system sets to accommodate, which is defined in include/Linux/tasks. h. Gdt directly allocates storage units (marked as gdt_table) in head. S ). The initialized gdt is shown in table 1. The value of GDTR is implemented by running the lgdt gdt_descr command. By checking the value of gdt_descr in head. S, we can know that the base address of gdt is the address of gdt_table, and the table length is gdt_entries * 8-1 (0x205f ).
4. page Directory initialization: The page Directory is represented by the swapper_pg_dir variable. There are 1024 page Directory items in total. Both items 0th and 768th point to pg0 (page 0th), and the initial value is 0x00102007 (based on its 20bits high value 0x102: 0x102 * 4kb = 0x102000, the physical address is 0 x 0th after the page Directory is followed by the 102000 page. Therefore, the virtual addresses 0x0 and 0 xbfffffff (3 GB) in Linux 4 GB space are known) all are mapped by pg0 (physical address 0x0 ~ 0x3fffff (4 MB); the initial values of directory items on other pages are 0x0;
5. initialization of pg0: item n corresponds to page N, and the attribute is 0x007; that is, the initialization value of item n is 20 bits and the base 12bits value is 0x007; it can be seen that pg0 maps the physical space to a lower space of 4 MB;
6. initialize empty_zero_page: the first 2 kb of space on the page is used to store the BIOS system hardware parameters stored in setup. s in the memory parameter area, and the last 2 kb space is used as the command line buffer;
Head. s. After initialization, call start_kernel (init/main. c) Continue initialization in various aspects, mainly by calling various functions to initialize the data structure of the kernel. Next, let's briefly describe the calling functions related to the x86 System (related to this Article.
1. setup_arch () (ARCH/i386/kernel/setup. c); set the available physical address range of the kernel (memory_start ~ Memory_end); set the init_task.mm range; call request_region (kernel/resource. C) to apply for an I/O space. See Appendix 4.
2. paging_init () (ARCH/i386/MM/init. c) Cancel ing of virtual address 0x0 to low-end 4 MB space of physical address; Initialize all page tables based on the actual size of the physical address.
3. trap_init () (ARCH/i386/kernel/traps. c); set various entry addresses in IDT, such as exception event handler entry, system call entry, and call gate. Where, trap0 ~ Trap17 is a variety of error entries (overflow, Division 0, page errors, etc. The error handling function is defined in arch/i386/kernel/entry. s); trap18 ~ Trap47 retained; set the system call (INT 0x80) entry to system_call (ARCH/i386/kernel/entry. s); In gdt, set the TSS segment descriptor and LDT segment descriptor of process 0.
4. init_irq () (ARCH/i386/kernel/IRQ. c); initialize 0x20 ~ 0xff.
5. time_init () (ARCH/i386/kernel/time. c); read the real-time and reset the interrupt service program entry for clock interruption irq0.
6. mem_init () (ARCH/i386/MM/init. c); initialize empty_zero_page; mark the occupied page.

Linux Process and multipart Paging

Every time a new process is started, Linux creates a process control block (task_struct, include/Linux/sched. h) for it ). The most important storage-related members of task_struct are mm (mm_struct * Mm, include/Linux/sched. h) and TSS (thread_struct TSS, include/asm-i386/processor. h ). During the creation process, the system involves the following functions (related to multipart paging:
1. Create a new page Directory (MM member pgd_t * PGD) for each process (as needed) and place the address in the Register "S3." related code:
New_page_tables (mm/memory. c); // create and initialize a new page Directory
Set_page_dir (include/asm-i386/pgtable. H); // set the page Directory base address register
2. add the TSS and LDT items corresponding to the process in gdt. The gdt item numbers used are recorded in tr (unsigned long TR) and LDT (unsigned long LDT) of the TSS Member respectively; related code:
_ LDT/_ TSS (include/asm-i386/DESC. H); // converts the gdt entry number corresponding to LDT/TSS
Set_ldt_desc/set_tss_desc (ARCH/i386/kernel/traps. c); // Add the LDT/TSS descriptor to gdt
3. Create the LDT (MM member void * segments) of the process. Related code:
Copy_segments (ARCH/i386/kernel/process. c); // create the process's LDT and initialize the LDT
Linux uses the "On-Demand page adjustment" principle to allocate memory pages, so as to avoid excessive storage space occupation by page tables. When a process is created, page allocation is roughly like this: Process Control Block (one page), memory state stack (one page), and page Directory (one page ); page table (N pages required ). In the future execution of the process, more memory pages will be allocated as needed.
Appendix
Table 1 gdt table item distribution in Linux
Table 2 Linux LDT table item distribution
Table 3, head. s ing in physical memory
Appendix 4: Device Application for I/O space
 

References
1. "inter architecture software developer's Manual Volume 3: system programming", http://developer.intel.com/design/pentiumii/manuals/243192.htm
2. "Linux operating system and experiment tutorial", edited by Li shanping Zheng, Mechanical Industry Press
3. "Linux kernel source code analysis" by Scott Maxwell, translated by Feng Rui Xing Fei Liu longguo Lu Lina, Mechanical Industry Press
4. "Linux system analysis and advanced programming technology", edited by Zhou weisong, Mechanical Industry Press

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.