Deep understanding of the Linux kernel day01--memory addressing

Source: Internet
Author: User
Tags parent directory

Memory addressing
Memory address: Logical Address: Segment + offset composition
Linear address: Can be used to express the address of 4GB (also called virtual address)
Physical Address: Used for memory chip-level memory unit addressing. They correspond to the electrical signals sent to the memory bus by the microprocessor address pin
The Memory Control unit (MMU) converts a logical address into a linear address through a hardware store called a segmented unit, and then translates the linear address into a physical address through the hardware circuitry of the paging unit.
Segmented cell Paging unit
Logical address----------"linear address---------------" Physical Address

Fragmentation in hardware: Starting with 80286 mode, Intel microprocessors perform address translation in two different ways, both real and protected.

Segment selector and segment register: A logical address consists of two parts: a segment identifier and an offset to a relative address within a specified segment.
The segment identifier is a 16-bit long field, called the segment selector
The segment offset is a 32-bit constant field.
To quickly and easily find the segment selector, the processor provides a segment register, the only purpose of which is to hold the segment selector. These registers are called cs,ss,ds,es,fs,gs;
3 Have a special purpose:
CS Code Segment register, pointing to the segment that contains the program directives.
The SS stack segment register, which points to the segment that contains the current program stack.
The DS data segment register, which points to either a static data or a global data segment.
The CS register also has a very important function: it contains a two-bit field that indicates the current privilege level of the CPU. A value of 0 represents the highest priority, and a value of 3 represents the lowest priority. Linux uses only level 0 and Level 3, respectively, as the kernel state and user state.

Segment Descriptor: Each segment has a 8-byte segment descriptor representation that describes the characteristics of the segment. The segment descriptor is placed in the Global Descriptor Table (GDT) or in the Local Descriptor List (LDT).
The address and size of the GDT in main memory are stored in the GDTR control register, and the LDT address and size currently being used are placed in the LDTR control register.

Quick Access Segment Descriptor: The logical address consists of a 16-bit segment selector and a 32-bit offset, and the segment register holds only the segment selector.
Whenever a segment selector is loaded into a segment register, the corresponding segment descriptor is loaded into the corresponding non-programmable CPU register by memory.
Since a segment descriptor is 8 bytes long, its relative address within the GDT or LDT is multiplied by the value of the highest 13 bits of the segment selector by 8.

Segmented cell: How does a logical address convert to a corresponding linear address?
1. Line check the Ti field of the segment selector to determine the segment description characters in which descriptor table. The Ti field indicates that the descriptor is in the GDT (in this case, the segment unit gets the linear base address of the GDT from the GDTR register) or in the activated LDT (in which case the segment unit obtains the linear base address of the GDT from the LDTR register).
2. The address of the segment descriptor is computed from the index field of the segment selector, and the value of the index field is multiplied by 8 (the size of a segment descriptor), which is then added to the contents of the GDTR or LDTR register.
3, add the offset of the logical address and the value of the segment Descriptor base field to get the linear address.
Note: With the non-programmable registers associated with the segment register, the first two operations need to be performed only if the contents of the segment register are changed.

Fragmentation in Linux: Segments can assign different linear address spaces to each process, while paging can map the same linear address space space to different physical spaces.
Linux prefers paging compared to fragmentation because:
1. When all memory uses the same segment register value, memory management becomes simpler, meaning that they can share the same set of linear addresses.
2. One of the Linux design goals is to keep him on top of the vast majority of processor platforms left behind, however, RISC architectures have limited support for fragmentation.
All Linux processes running in the user state are addressing instructions and data in a pair of identical fields. These two segments are called user code snippets and user data segments. Similarly, all Linux processes running in the kernel state are addressing instructions and data with a bunch of identical segments, called the kernel code snippet and the kernel data segment, respectively.
The corresponding segment selection Fu Yuhong __user_cs, __user_ds, __kernel_cs, __kernel_ds are defined separately. For example, to address the kernel snippet, the kernel only needs to load the value generated by the __KERNEL_CS macro into the CS segment register.
Note that the linear address associated with the segment starts at 0 and reaches 2 of 32 of this side-1 of the addressing limit. This means that all processes in the user state or kernel state can use the same logical address.
All segments start with 0x00000000, which can be another important conclusion: the logical address is consistent with the linear address in Linux, where the value of the offset field of the logical address is always the same as the value of the corresponding linear address.

The Linux GDT has only one GDT in a single-processor system, and each CPU in a multiprocessor system corresponds to a GDT. All GDT is stored in the cpu_gdt_table array, and the addresses of all GDT and their size (used when initializing the GDTR register) are stored in the CPU_GDT_DESCR array.
Each GDT contains 18 segment descriptors and 14 empty, unused, or reserved items. Inserts are used for the purpose of the item so that the descriptors that are frequently accessed together are able to handle the same 32-byte hardware cache line.
Each GDT contains 18 segment descriptors that point to the following segments:
There are 4 code snippets and data segments in both user and kernel states.
Task status Segment (TSS), each processor has one.
A segment that includes the default local description table.
Three local thread storage (TLS) Segments: This mechanism allows multithreaded applications to use up to 3 local and thread data segments. The system uses Set_thread_area () and Get_thread_area () to create and revoke a TLS segment for the executing process, respectively
3 segments related to Advanced Power Management (AMP).
5 segments related to a BIOS service program that supports Plug and Play (PNP) functionality.
A special TSS segment that is used by the kernel to handle "double error" exceptions.

Linux LDT Most user-state Linux programs do not use a local descriptor, so the kernel defines a default LDT worker majority process sharing. The default local description table is stored in the Default_ldt array. It contains 5 items, but the kernel only uses two items effectively.
In some cases, the process still needs to create its own local descriptor. This is useful for some applications like wine, and they face segments of Microsoft Windows applications. The Modify_ldt () system call allows a process to create its own local descriptor table.

The paging unit in the hardware translates the linear address into a physical address.
One of the key tasks is to compare the requested access type with the access to the linear address, and if this memory access is invalid, a fault is generated.
For efficiency reasons, linear addresses are divided into fixed-length groups called pages. The continuous linear address within the page is mapped to a contiguous physical address. This allows the kernel to specify the physical address of a page and its access rights. Instead of specifying access to all of the linear addresses contained in the page.
A paging unit divides all ram into a fixed-length page frame (sometimes called a physical page). Each page box contains a page, which means that the length of a page box is the length of a page.
The data structure that maps a linear address to a physical address is called a page table. The page table is stored in main memory, and the page table must be properly initialized by the kernel before the paging unit is enabled.

General paging: From 80386 onwards, the paging unit of the Intel processor processes 4KB pages.
A 32-bit linear address is divided into 3 domains:
Directory up to 10 bits
Table (page table) Middle 10 bits
Offset (offsets) minimum 10 bits
The transformation of the linear address is done in two steps, each based on a conversion table, the first conversion table is called the Page Directory table (page directory), and the second conversion table is called the page table.
The purpose of using this level two pattern is to reduce the amount of RAM required for each Process page table. If you use a simple one-level page table, you will need up to 2 of the 20-page table to represent each process's page table, even if a process does not use all of the addresses in that range.
Secondary mode reduces memory capacity by requesting page tables only for those virtual memory areas that the process actually uses.
Each active process must have a page directory assigned to it. However, it is not necessary to immediately allocate RAM for all page tables in the process. It is more efficient to allocate RAM to the page table only if the process actually requires a page table.
The directory fields within the linear address determine the catalog items in the page directory, and the catalog entries point to the appropriate page table. The table field of the address determines the appearance in the page table in turn, and the page item contains the physical address of all page boxes of the page. The offset field determines the relative position within the page box. Since it is 12 bits long, each page contains 4096 bytes of data.

Extended paging: Extended paging allows the page frame size to be 4MB instead of 4KB. Extended chalk is used to convert large segments of continuous linear addresses into corresponding physical addresses, in which case the kernel can save memory and retain TLB entries without using an intermediate page table for address translation.
As described earlier, the extended paging feature is started by setting the page size flag of the pages directory. In this case, the paging unit divides a 32-bit linear address into two fields:
Directory up to 10-bit
Offset remaining 22 bits
Page catalog entries for extended paging and normal paging are basically the same, except:
The Page size flag must be set.
A 20-bit physical Address field only has a maximum of 10 bits to make sense. This is used for every physical address that starts at the boundary of 4MB, so the minimum 22 bits of this address are 0.

Hardware protection scheme: the protection scheme of the paging unit and the segmented unit is different.

Paging for 64-bit operating systems all 64-bit processors have a hardware paging system that uses an extra paging level. The number of levels used depends on the type of processor.

Hardware cache: Today's microprocessor clock frequencies are close to several GHz, while dynamic RAM (DRAM) chips have access times of hundreds of times times the clock cycle. This means that the CPU may wait for a long time when the execution of an instruction such as a number of items in RAM is stored from RAM.
In order to reduce the speed mismatch between the CPU and RAM, hardware cache memory is introduced. The hardware cache is based on the well-known locality principle, which is the use of program structures and also for data structures. To do this, a new unit called line is introduced into the 80x86 architecture. Rows are made up of dozens of contiguous bytes, which are passed between the slow dram and the fast cache-on-chip static RAM (SRAM) in burst mode for cache implementation.
The cache unit is inserted between the paging unit and the main memory. It contains a hardware cache memory (hardware cache memories) and a cache controller.
Cache memory holds real rows in memory. The cache controller holds an array of table items, each of which corresponds to one row in cache memory. Each table entry has a label (tag) and several flags (flag) that describe the state of the cache line.

Paging in Linux: Linux uses a common paging model for both 32-bit and 64-bit systems.
Until version 2.6.10, Linux uses a three-level paging model. Starting with version 2.6.11, a four-level paging model is used.
The 4-page table of the four-level paging model is called:
Page Global Directory
Page Parent directory (page Upper directory)
Page Intermediate directory (page Middle directory)
Page table

Linear Address field: The following macro simplifies page table processing:
Page_shift: Specifies the number of digits of the offset field;
Page_size: Used to return the size of the page.
Page_mask: The resulting value is 0xfffff000, which masks all bits of the offset field.
Pmd_shift: Specifies the offset field of the linear address and the total number of digits of the table field, in other words, the logarithm of the size of the area that the Schema page intermediate catalog item can map.
Pmd_size: Used to calculate the size of the area mapped by a single table entry in the middle of the page, that is, the size of a page table.
Pmd_mask: Used to mask all digits of the offset field and the table field.
Pud_shift: Determines the logarithm of the size of the area that the page ancestor entries can map.
Pud_size: Used to calculate the size of the area that can be mapped for a single table item in a page's global catalog.
Pud_mask: Used to mask all bits of the Offset field, table field, middle Air field, and upper air field.
Pgdir_shift: Determines the logarithm of the size of the area that the page global catalog item can map.
Pgdir_size: Calculates the size of the area that a single table item in the page's global catalog can map.
Pgdir_mask: Masks All bits of the offset, Table, middle air, Upper air fields.
PTRS_PER_PTE,PTRS_PER_PMD,PTRS_PER_PUD,PTRS_PER_PGD:
Used to calculate the number of table entries in page tables, page intermediate catalogs, page ancestor directories, and global catalog tables.
When PAE is banned, they produce values of 1024,1,1,0. When PAE is activated, the resulting values are 512,512,1,4

Page Table Processing: pte_t,pmd_t,pud_t,pgd_t Describes the format of page table items, page intermediate catalog items, page ancestor directories, and page global catalog items, respectively. When PAE is activated, they are all 64-bit data types, otherwise they are 32-bit data types.
Pgprot_t is another 64-bit (PAE-activated) or 32-bit (PAE-disabled) data type that represents the protection flags associated with a single table item.
Five type conversion macros (__pte,__pmd,__pud__pgd,__pgprot) converts an unsigned integer to the desired type.
The other five type conversion macros (Pte_val,pmd_val,pud_val,pgd_val,pgprot_val) perform the opposite conversion by converting the four special types mentioned above into an unsigned integer.
The kernel also provides many macros and functions for reading or modifying page table options:
1) If the value of the corresponding table entry is 0, then the red Pte_none,pmd_none,pud_none,pgd_none produces a value of 1, otherwise the resulting value is 0;
2) macro Pte_clear,pmd_clear,pud_clear,pgd_clear clears a table entry for the corresponding page table, thereby prohibiting the process from using the linear address of the page table item mapping.
The Ptep_get_and_clear () function clears a page table entry and returns the previous value.
3) SET_PTR,SET_PMD,SET_PUD,SET_PGD writes the specified value to a page table entry.
Set_pte_atomic and Set_pte are the same, but when PAE is activated, the 64-bit value is also guaranteed to be written by the atom.
4) If a, B two page table entries point to the same page and specify the same access priority, then Pte_same (A, A, A,) returns 1, otherwise 0 is returned.
5) If the page intermediate directory item e points to a large page (2MB or 4MB) then Pmd_large (e) returns 1, otherwise returns 0.

Physical Memory Layout:
During the initialization phase, the kernel must establish a physical address mapping to specify which physical address ranges are available to the kernel and which are unavailable (either because they map the shared memory of hardware device I/O, or because the corresponding page box contains BIOS data).
The kernel registers the following page boxes as reserved:
The page box within the range of physical addresses that are not available.
A page box that contains the kernel code and the initialized data structure.

The linear address space of the Process page table process is divided into two parts:
A linear address from 0x00000000 to 0XBFFFFFFF, regardless of whether the process allows the user state or kernel state to be addressable.
From 0xc0000000 to 0xFFFFFFFF linear addresses, only kernel-state processes can be addressed.
When the process runs in the user state, it produces a linear address less than 0xc0000000, and when the process runs in the kernel state, it executes the kernel code, resulting in an address greater than or equal to 0xc0000000. However, in the case of dictation, the kernel must access the user-state linear address space in order to retrieve or store the data.

Kernel page table: The kernel maintains a set of page tables that are used by itself, residing in the so-called main kernel page global catalog. After the system is initialized, this set of page tables has never been used directly by any process or any kernel thread, or, more specifically, the highest directory entry portion of the main kernel page global catalog serves as a reference model, providing a reference model for each regular corresponding page global catalog item in the bit system.
How does the kernel initialize its own page table? The process is divided into two stages. In fact, after the kernel image has just been loaded into memory, the CPU is still running in real mode, so the paging feature is not started.
In the first phase, the kernel creates a limited address space, including the kernel's code snippet and data segment, the initialization page table, and the 128KB size of the dynamic data structure. This is the minimum address space sufficient to load the kernel into RAM and initialize the core data structure to it.
In the second phase, the kernel takes full advantage of the remaining RAM and establishes the paging table appropriately. The following explains how the scheme is implemented.

Temporary Kernel page table: The temporary page Global directory is statically initialized during kernel compilation, and the temporary page table is initialized by the STARTUP_32 () assembly language function. At this stage, PAE support is not activated.
The temporary global directory is placed in the Swapper_pg_dir variable. The temporary page table is stored at the PG0 variable, followed by the uninitialized data segment of the kernel. For the sake of simplicity, we assume that the kernel uses a segment, a temporary page table, and a 128KB memory range that can fit in the 8MB space before RAM, and two page tables are needed in order to map the 8MB space before RAM.
The goal of the first phase of paging is to allow easy 8MB addressing both in real mode and in protected mode. Therefore, the kernel must create a mapping that maps the linear address of 0x00000000 to 0X007FFFFF and the linear address of 0x00800000 to 0xc07fffff to the physical address from 0x00000000 to 0X007FFFFF.
In other words, buying is the kernel in the first phase of initialization, can be addressed by the same linear address as the physical address, or by the 8MB linear address starting from 0xc0000000 to the 8MB of RAM.

When RAM is less than 896MB, the final kernel page table provided by the Kernel page table must be converted from a linear address starting with 0xc0000000 to a physical address starting at 0.
Macro __PA is used to convert a linear address starting from Page_offset to the corresponding physical address, while the macro __va does the opposite conversion.
The main kernel page global catalog is still stored in the Swapper_pg_dir variable. It is initialized by the Paging_init () function. The functions do the following:
1. Call Pagetable_init () to set up the page table entry appropriately
2. Write the physical address of the Swapper_pg_dir to the CR3 control register.
3. If the CPU supports PAE and if kernel compilation supports PAE, the PAE flag of the CR4 control register is placed
4. Calling __flush_tlb_all () invalidates all the entries for the TLB.

When the RAM size is between 896MB and 4096MB, the final kernel page table in this case does not map all of the RAM to the kernel address space. The best thing that Linux can do during the initialization phase is to map a ram window with 896MB to the kernel linear address space. If you need to address the rest of the existing RAM, you must map some of the other linear address intervals to the required RAM. This means modifying the values of some page table entries.
The kernel uses the same code as the previous one to initialize the page global catalog.

When RAM is greater than 4096MB, the final kernel page behaves in the form of a kernel page table initialization that lets us consider the ram larger than the 4GB computer; rather, we deal with what happens:
CPU Models support Physical Address extensions (PAE)
RAM capacity greater than 4GB
Kernel is compiled with PAE support
Although PAE supports 36-bit physical addresses, the linear address is still a 32-bit address. As mentioned earlier, Linux maps a 896MB RAM window to the kernel linear address space; the remaining RAM remains unmapped and is handled with dynamic remapping.
The main difference in the previous case was the use of a three-level paging model.

The linear address of a fixed map we see the initial part of the kernel linear address fourth GB mapping the physical memory of the system. But at least 128MB of linear addresses are always reserved for him, so the kernel uses these linear addresses to implement non-contiguous memory allocations and fixed mappings for linear addresses.
Fixed map linear address is basically a constant linear address similar to 0xffffc000, the corresponding physical address is not equal to the linear address minus 0xc0000000, but can be established in any way.
As we will see later in this chapter, the kernel uses fixed-mapped linear addresses instead of pointer variables, and the values of these pointer variables should never be changed.
The linear address of each fixed map is stored at the end of the fourth GB of the linear address.
To correlate a physical address with a linear address of a fixed map, the kernel uses Set_fixmap (Idx,phys) and Set_fixmap_nocache (Idx,phys) macros.
Both functions initialize the Fix_to_virt (IDX) linear address corresponding to a page table entry to the physical address Phys;
In turn, Clear_fixmap (IDX) is used to undo the connection between the fixed map linear address IDX and the physical address.

Handling hardware caches and TLB hardware caches and converting backyard caches (TLB) plays a very important role in improving the performance of modern computer architectures.

Processing the hardware cache to optimize the cache hit rate, the kernel considers the architecture in the following decisions:
The most commonly used field in a data structure is placed in the low-offset portion of the low-memory structure of the database so that they can be in the same row as the cache
When allocating space for a large set of data structures, the kernel tries to store them in memory so that all cache rows are used in the same way.

Processing TLB processors cannot automatically synchronize their own TLB caches, because it is the kernel, not the hardware, that determines when mapping between linear and physical addresses is no longer valid.
In general, any process switchover implies the replacement of the active page table level. The local TLB table entry must be refreshed relative to the Expired page table, which is done automatically when the kernel writes the address of the new page Global directory to the CR3 controller.
However, the kernel avoids flushing the TLB in the following situations:
1. When two processes switch between different processes that use the same page table set
2. When a process switch is performed between a normal process and a kernel thread.
In fact, each kernel thread does not have its own set of page tables, or rather it uses a common Process page table set.

Deep understanding of the Linux kernel day01--memory addressing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.