Explore the Linux memory model

Source: Internet
Author: User

 

Understanding the memory model used by Linux is the first step to master the Linux design and implementation to a greater extent. Therefore, this article will outline the Linux memory model and management.

Linux uses a monolithic, which defines a set of primitives or system calls to implement operating system services, for example, process management, concurrency control, and memory management services run in super mode in several modules. Despite compatibility considerationsSegment control unit model)A symbolic representation is maintained, but this model is rarely used.

Main Problems Related to memory management include:

  • Virtual Memory Management, which is a logical layer between application requests and physical memory.
  • Physical memory management.
  • Kernel Virtual Memory Management/kernel memory distributor, which is a component used to meet memory requests. This kind of memory request may come from the kernel or the user.
  • Manage virtual address spaces.
  • Exchange and cache.

This article discusses the following issues to help you understand the Linux kernel from the perspective of Memory Management in the operating system:

  • Segment control unit model, usually used in Linux
  • Paging model, usually used in Linux
  • Knowledge about physical memory

Although this article does not provide a detailed description of the Linux kernel memory management method, it introduces the knowledge of the entire memory model and the addressing method of the system. These introductions provide a framework for your further study. This article focuses on the X86 architecture, but the knowledge in this article is also applicable to other hardware implementations.

X86 memory architecture

In X86 architecture, memory is divided into three types of addresses:

  • Logical Address)Is the address of the storage location, which may correspond directly to a physical location, or not directly to a physical location. The logical address is usually used for information in the request controller.
  • Linear address)(Or calledFlat address space) Is the memory for addressing from 0. Each subsequent byte can be referenced in sequence using the next number (0, 1, 2, 3, etc.) until the end of the memory. This is the addressing mode for most non-Intel CPUs. The intel architecture uses a segmented address space. The memory is divided into 64 kb segments, and a segment register always points to the base address of the currently addressable segment. The 32-Bit mode in this architecture is considered as a flat address space, but it also uses segments.
  • Physical address)Is an address expressed by a bit in the physical address bus. The physical address may be different from the logical address. The memory management unit can convert the logical address into a physical address.

The CPU uses two units to convert a logical address to a physical address. The first type is calledSegmented Unit), Also knownPaging Unit).

Figure 1. Two units used to convert an address space

Next, let's introduce the segment control unit model.

Back to Top

Section control unit model Overview

The basic idea behind this segmentation model is to manage memory segments. In essence, each segment is its own address space. The segment consists of two elements:

  • Base Address)Address that contains a physical memory location
  • Length Value)Specify the length of the segment

The segment address also includes two components --Segment Selector)AndOffset into the segment). The segment selector specifies the segments to be used (I .e. the base address and length value), while the intra-segment offset component specifies the offset between the actual memory location and the base address. The physical address of the actual memory location is the sum of the base address value and offset. If the offset exceeds the segment length, a protection violation error is generated.

The preceding content can be summarized as follows:

Segment unit can be expressed as-> segment: Offset model or-> segment identifier: Offset

Each segment is a 16-bit field calledSegment identifier)OrSegment Selector). X86 Hardware includes several programmable registers called
Segment register (segment register), The segment selector is saved in it. These registers arecs(Code segment ),ds(Data Segment) and
ss(Stack segment ). Each segment identifier represents a 64-bit (8 bytes)Segment descriptor (segment descriptor). These segment descriptors can be stored in a gdt (Global Descriptor Table) or a LDT (Local Descriptor Table.

Figure 2. Relationship between segment descriptors and segment registers

Each time a segment selector is loaded into a segment register, the corresponding segment descriptor is loaded from the memory to a matched unprogrammable CPU register. Each segment descriptor is 8 bytes long, indicating a segment in the memory. These are stored in LDT or gdt. The segment descriptor entry contains a pointer and a 20-bit value (Limit field). The former points to the first byte in the correlation segment represented by the base field, and the latter indicates the size of the middle segment of the memory.

Some other fields also contain some special attributes, such as priority and segment type (csOrds). The segment type is represented by a four-digit type field.

Since we use unprogrammable registers, gdt or LDT is not referenced when converting logical addresses into linear addresses. This will speed up the conversion of memory addresses.

The segment selector contains the following content:

  • A 13-bit index used to identify the corresponding segment descriptor entries in gdt or LDT
  • Ti (Table indicator) indicates whether the specified segment descriptor is in gdt or LDT. If the value is 0, the segment descriptor is in gdt. If the value is 1, the segment descriptor is in LDT.
  • RPL (request privilege level) defines the current CPU privilege level when the corresponding segment selector is loaded into the segment register.

Because the size of a segment descriptor is 8 bytes, the relative address of the segment descriptor in gdt or LDT can be calculated as follows: the height of the segment selector is 13 BITs multiplied by 8. For example, if gdt is stored in address 0x00020000 and the index field of the segment selector is 2, the corresponding segment descriptor address is equal to (2*8) + 0x00020000. The total number of segment descriptors that can be stored in gdt is equal to (2 ^ 13-1), that is, 8191.

Figure 3 shows how to obtain a linear address from a logical address.

Figure 3. Obtain a linear address from a logical address

So what is the difference in Linux?

Back to Top

Segment Control Unit in Linux

Linux has slightly modified this model. I have noticed that Linux uses this segmentation model in a limited way (mainly for compatibility considerations ).

In Linux, all segment registers point to the same segment address range-in other words, each segment register uses the same linear address. This restricts the number of segment descriptors used by Linux, so that all descriptors can be saved in gdt. This model has two advantages:

  • Memory Management is easier when all processes use the same segment register value (when they share the same linear address space.
  • Portability can be achieved in most architectures. Some also support segments in this restricted way.

Figure 4 shows the modifications to the model.

Figure 4. in Linux, segment registers point to the same address set

Segment descriptor

Linux uses the following descriptor:

  • Kernel code segment
  • Kernel data segment
  • User code segment
  • User Data Segment
  • Tss segment
  • Default LDT segment

These register segments are described in detail below.

In gdtKernel code segment)The values in the descriptor are as follows:

  • Base = 0x00000000
  • Limit = 0 xffffffff (2 ^ 32-1) = 4 GB
  • G (granularity flag) = 1, indicating that the segment size is displayed in the unit of page
  • S = 1, indicating common code or data segment
  • Type = 0xa, indicating the code segment that can be read or executed
  • DPL value = 0, indicating Kernel Mode

The linear addresses related to this segment are 4 GB, S = 1 and type = 0xa, indicating the code segment. Selector incsRegister. In Linux, the macro used to access this selector is
_KERNEL_CS.

Kernel data segment)The descriptor value is similar to the kernel code segment value. The only difference is that the type field value is 2. This indicates that this segment is a data segment, and the selector is stored in
dsRegister. In Linux, the macro used to access this selector is_KERNEL_DS.

User code segment)Shared by all processes in user mode. The value of the corresponding segment descriptor stored in gdt is as follows:

  • Base = 0x00000000
  • Limit = 0 xffffffff
  • G = 1
  • S = 1
  • Type = 0xa, indicating the code segment that can be read and executed
  • DPL = 3, indicating the user mode

In Linux, we can use_USER_CSMacro to access this selector.

InUser Data Segment)In the descriptor, the only different field is type. It is set to 2, which indicates that the data segment is defined as read and write. In Linux, the macro used to access this selector is
_USER_DS.

In addition to these segment descriptors, gdt also contains two segment descriptors for each created process-Tss and LDT segments.

EachTss segment (TSS segment)The descriptor represents a different process. Tss stores the hardware context information of each CPU, which helps to effectively switch the context. For example
U->KDuring mode switching, the x86 CPU gets the address of the kernel mode Stack from the TSS.

Each process has its own TSS descriptor of the corresponding process stored in gdt. The values of these descriptors are as follows:

  • Base = & TSS (the address of the TSS field corresponding to the process descriptor; for example&tss_struct) This is defined in the schedule. h file of the Linux kernel.
  • Limit = 0xeb (the size of the TSS segment is 236 bytes)
  • Type = 9 or 11
  • DPL = 0. In user mode, you cannot access TSS. G flag is cleared

All processes shareDefault LDT segment. By default, it contains an empty segment descriptor. This default LDT segment descriptor is stored in gdt. The LDT size generated by Linux is 24 bytes. By default, there are three entries:

LDT [0] = null ldt [1] = user code segment LDT [2] = user data/stack segment descriptor

Computing task

To calculate the maximum number of entries that can be stored in gdt, you must first understandNR_TASKS(This variable determines the number of concurrent processes supported by Linux. The default value in the kernel source code is 512, and a maximum of 256 concurrent connections to the same instance are allowed ).

The total number of items that can be stored in gdt can be determined by the following formula:

Number of entries in gdt = 12 + 2 * nr_tasks. As mentioned above, the number of items that gdt can save is 2 ^ 13-1 = 8192.

In these 8192 segment descriptors, Linux uses six segment descriptors, and four other descriptors will be used for APM features (advanced power management features ), in gdt, four entries are retained. Therefore, the number of entries in gdt is 8192-14, that is, 8180.

In any case, the number of entries in gdt is 8180, so:

2 *NR_TASKS= 8180
NR_TASKS= 8180/2 = 4090

(Why?2 *NR_TASKS? For each created process, a TSS descriptor must be loaded to maintain context switching and an LDT descriptor .)

In this X86 architecture, the number of processes is limited to one component in Linux 2.2, but this problem does not exist since the kernel of version 2.4, this is partly due to the use of hardware context switching (which inevitably requires TSS) and replacing it with process switching.

Next, let's take a lookPaging Model.

Back to Top

Paging model Overview

The paging Unit is responsible for converting linear addresses into physical addresses (see figure 1 ). Linear addresses are divided into pages. These linear addresses are actually continuous-the paging unit maps these consecutive memories into the corresponding continuous physical address range (called
Page). Note that the paging unit intuitively divides ram into fixed page boxes.

As a result, paging has the following advantages:

  • The access permission defined for a page stores the entire set of linear addresses that constitute the page.
  • The page size is equal to the page size.

The data structure that maps these pages into a page box is calledPage table). The page table is stored in the primary storage and can be properly initialized by the kernel before the paging unit is enabled. Figure 5 shows the page table.

Figure 5. page table converting a page to a page box

Note that the address set contained in page1 exactly matches the address set contained in page frame1.

In Linux, paging units are used more than segmentation units. As mentioned above, each segment descriptor uses the same address set for linear addressing to minimize the need to use a segment unit to convert logical addresses into linear addresses. By using more paging units rather than segmented units, Linux can greatly promote memory management and portability between different hardware platforms.

Fields used during Paging

The following describes the paging fields used to specify paging fields in the X86 architecture. These fields help to implement paging in Linux. The paging Unit enters the linear field as the output result of the segmentation unit, and is further divided into the following three fields:

  • DirectoryIt is expressed as 10 MSB (most significant bit, that is, the location of the largest bit in the binary value-MSB is sometimes called the leftmost bit ).
  • TableIt is represented by 10 digits in the middle.
  • OffsetIt is expressed as 12 LSB. (Least significant bit, that is, the position of the given unit value in the binary integer, that is, determine whether the number is an odd or even number. LSB is sometimes called the rightmost bit. This is similar to the number with the lightest weight, which is the number at the rightmost position .)

The process of converting a linear address to a physical location involves two steps. The first step is to usePage Directory)(From the page Directory to the page table), the second step usesPage table)(That is, adding offset to the page table and then adding a page box ). Figure 6 shows the process.

Figure 6. Paging Fields

First, load the physical address of the page Directorycr3Register. The directory Field in the linear address determines that the page Directory points to the appropriate page table entries. The address in the table field determines the entries in the page table where the physical address of the page that contains the page is located. The offset field determines the relative position in the page. Because the offset field is 12 bits, each page contains 4 KB data.

The following is a summary of the calculation of physical addresses:

  1. cr3+ Page Directory (10 MSB) = pointtable_base
  2. table_base+ Page table (10 intermediate bits) = pointingpage_base
  3. page_base+ Offset = physical address (obtain the page)

Because both the page directory Field and the page table segment are 10 bits, the maximum addressing capacity is 1024*1024 kb, and the maximum range of the Offset addressable is 2 ^ 12 (4096 bytes ). Therefore, the maximum addressable page Directory is 1024*1024*4096 (equal to 2 ^ 32 memory units, that is, 4 GB ). Therefore, in the X86 architecture, the total addressable ceiling is 4 GB.

Extended Paging

Extended paging is achieved by deleting the page table conversion table. After that, linear address division can be completed between the page Directory (10 MSB) and offset (22 LSB.

22 LSB forms the 4 MB boundary of the page box (2 ^ 22 ). Extended paging can be used with common paging models and can be used to map large continuous linear addresses to corresponding physical addresses. Delete a page table in the operating system to provide an extended page table. This can be achieved by setting PSE (page size extension.

The 36-bit PSE extends the 36-bit physical address, supports 4 MB pages, and maintains a 4-byte page directory entry, in this way, you can provide a method to address physical memory larger than 4 GB without making too many modifications to the operating system. This method has some practical limitations for on-demand paging.

Back to Top

Paging model in Linux

Although paging in Linux is similar to normal paging, The X86 architecture introduces a three-level page table mechanism, including:

  • Page Global Directory)PGD is the highest abstract level of a multi-level page table. The page tables at each level process memory of different sizes-this global directory can process 4 MB of memory. Each item points to a lower-level table with a smaller directory. Therefore, PGD is a page table directory. When the code traverses this structure (some drivers need to do this), it is called "traversing" the page table.
  • Page middle directory)The top layer of the page table. In the X86 architecture, PMD does not exist in the hardware, but it is merged with PGD in the kernel code.
  • Page table entry), That is, PTE, is the lowest layer of the page table. It processes the page directly (seePAGE_SIZE). The value includes the physical address of a page, and also the location indicating whether the entry is valid and whether the page is in the physical memory.

To support large memory areas, Linux also uses this three-level paging mechanism. When it is not required for a large memory area, it can be defined as "1" and the two-level paging mechanism is returned.

The paging level is optimized during compilation. We can enable two or three levels of paging by enabling or disabling the intermediate directory (using the same Code ). The 32-bit processor uses the PMD paging, while the 64-bit processor uses the PGD paging.

Figure 7. Three-Level Paging

As you know, in a 64-bit processor:

  • 21 MSB retained
  • 13. The LSB is represented by the page offset.
  • The remaining 30 digits are divided:
    • 10-bit for page tables
    • 10-bit global directory for pages
    • 10 digits are used in the middle directory of the page

We can see from the architecture that 43 bits are actually used for addressing. Therefore, in a 64-bit processor, the memory that can be effectively used is 43 to the power of 2.

Each process has its own page Directory and page table. To reference a page box containing actual user data, the operating system (on the X86 architecture) First loads the PGDcr3Register. Linux
cr3The register content is stored in the TSS segment. After that, as long as a new process is executed on the CPU, another value is loaded from the TSS segmentcr3Register. So that the paging unit references a set of correct page tables.

Each entry in the PGD table points to a page box that contains a group of PMD entries. Each entry in the PDM table points to a page box that contains a group of PTE entries; each entry in the apsaradb for RDS table points to a page box that contains user data. If the page you are searching for has been transferred out, an exchange entry will be stored in the PTE table (when pages are missing) to locate which page box to reload to the memory.

Figure 8 shows that we add offset for all levels of page tables consecutively to map corresponding page box entries. We can get the offset by entering the linear address output as a segmentation unit and dividing the address. To divide a linear address into corresponding elements of each page table, you need to use different Macros in the kernel. This article does not detail these macros. Next we will take a look at the linear address division method in figure 8.

Figure 8. linear addresses with different address lengths

Reserved page

Linux reserves several page boxes for kernel code and data structure. These pagesNeverTransferred to the disk. From 0x0 to 0xc0000000 (PAGE_OFFSETCan be referenced by user code and kernel code. Slave
PAGE_OFFSETThe linear address to 0 xffffffff can only be accessed by kernel code.

This means that only 3 GB of memory space can be used for user applications.

How to enable Paging

The paging mechanism used by Linux processes involves two phases:

  • At startup, the system sets a page table for 8 Mb physical memory.
  • Then, the second stage completes the ing of other physical addresses.

In the startup phase,startup_32()The call is responsible for initializing the paging mechanism. This is implemented in the arch/i386/kernel/head. s file. The 8 Mb ing occurs in
PAGE_OFFSET. This Initialization is performed by a statically defined compile-time group (swapper_pg_dir. It is put into a specific address (0x00101000) during compilation ).

This operation is the static two pages defined in the Code --pg0Andpg1-- Create a page table. The default size of these page boxes is 4 kb, unless we have set the page size extension bit (for more information about PSE, see
Extended paging ). The data address pointed to by this global array is stored in
cr3In the register, I think this is the first stage to set paging units for Linux processes. The remaining page items are completed in the second stage.

The second stage is called by the methodpaging_init().

In a 32-bit X86 architecture, Ram mapsPAGE_OFFSETAnd 4 GB (0 xffffffff. This means that about 1 gb ram can be mapped at Linux Startup, which is performed by default. However, if someone sets
HIGHMEM_CONFIG, You can map more than 1 GB of memory to the kernel-remember this is a temporary arrangement. You can callkmap().

Back to Top

Physical memory area

I have shown you that the Linux kernel (in 32-bit architecture) divides virtual memory by a ratio of: 3 GB of virtual memory is used for user space, 1 GB memory is used for kernel space. The kernel code and its data structure must all be in this 1 GB address space, but for this address space, a larger consumer is a virtual ing of physical addresses.

This problem occurs because if a piece of memory is not mapped to its own address space, the kernel cannot operate the memory. Therefore, the maximum amount of memory that the kernel can process is the virtual address space that can be mapped to the kernel minus the space that needs to be mapped to the kernel code itself. As a result, a x86 Linux system can use a physical memory of slightly less than 1 GB.

To meet the needs of a large number of users, support more memory and improve performance, and establish a memory description method independent of the architecture, the Linux memory model must be improved. To achieve these goals, the new model allocates memory components to the space of each CPU. Each space is called
NodeEach node is divided into severalRegion. Areas (indicating the memory range) can be further divided into the following types:

  • ZONE_DMA(0-16 Mb): contains the memory range in the low-end physical memory area required by the ISA/PCI device.
  • ZONE_NORMAL(16-896 MB): Memory range mapped directly from the kernel to the high-end physical memory range. All kernel operations can only be performed in this memory area, so this is an area that is crucial to performance.
  • ZONE_HIGHMEM(896 MB and higher memory): Other available memory that cannot be mapped to the kernel in the system.

The concept of node is used in the kernelstruct pglist_dataStructure. Region is usedstruct zone_structStructure. The physical page is
struct PageAll of theseStructAll are saved in the global structure array.struct mem_mapThe array is stored in
NORMAL_ZONE. Figure 9 shows the basic relationship between nodes, regions, and page boxes.

Figure 9. Relationship between nodes, regions, and page boxes

Support for virtual memory extension for Pentium II (use PAE -- physical address extension -- to access 64 GB memory on a 32-bit System) when the 4 GB physical memory (also on a 32-bit System) is supported, the high-end memory area will appear in the kernel memory management. This is a concept referenced on the x86 and iSCSI platforms. Generally, the 4 GB memory can be used
kmap()SetZONE_HIGHMEMMapZONE_NORMAL. Note that it is unwise to use more than 16 GB memory in a 32-bit architecture, even if PAE is enabled.

(PAE is a memory address extension mechanism provided by Intel. It supports applications by using the address processing wing extensions API in the host operating system, this allows the processor to expand the number of digits that can be used to address physical memory from 32 to 36 .)

The management of this physical memory area is throughRegional Distributor (Zone Allocator). It is responsible for dividing the memory into many areas; it can use each area as a allocation unit. Each specific allocation request uses a group of regions from which the kernel can be allocated in the order from high to low.

For example:

  • For a user's page request, you can first meet the requirements in the "common" area (ZONE_NORMAL);
  • If it failsZONE_HIGHMEMStart to try;
  • If this failsZONE_DMAStart to try.

The list of allocated regions in turn includesZONE_NORMAL,ZONE_HIGHMEMAndZONE_DMARegion. On the other hand, DMA page requests may only be satisfied from the DMA region, so the list of regions for such requests only contains the DMA region.

Back to Top

Conclusion

Memory Management is a group of very large, complex, and time-consuming tasks. It is also a very difficult task to implement, because we need to elaborate a model, it is very difficult to design a system to operate in a real multi-program environment. Components such as scheduling, paging behavior, and multi-process interaction all pose quite difficult challenges to us. I hope this article will help you understand the basic knowledge required to accept the Linux memory management challenge and provide you with a starting point.

References

Learning

  • You can refer to
    Original ENGLISH

  • Memory Management (developerworks, November 2004) outlines the memory management technologies used in Linux, including how memory management works, how to manage memory manually, semi-manually, and automatically.
  • Kernel comparison: improved memory management in the 2.6 kernel (developerworks, December March 2004) this section describes in detail the new technologies used to improve the usage of large amounts of memory, the reserved ing, the storage of page table items in high-end memory, and the stability of the Memory Manager.
  • Linux, a non-x86 stage (developerworks, May 2005) shows you what to do with non-X86 architecture.
  • Using shared objects in Linux (developerworks, May 2004) explains how to make full use of shared memory.
  • This outline of the Linux memory management system is a summary of some experiences on how Linux memory management actually works.
  • Linux Device Drivers, Third Edition(O'reilly, February 2005)
    Memory Management and DMA are well described.

  • Understanding the Linux kernel, Third Edition(O'reilly, November 2005) describes the code that constitutes the core of all Linux operating systems.
  • Understanding the Linux Virtual Memory Manager(Prentice Hall, September April 2004) provides a guide to detailed knowledge about Linux virtual memory.
  • In the developerworks Linux area, you can find more resources for Linux developers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.