Linux memory management

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux memory management is divided into two parts: physical address-to-virtual address ing and kernel memory allocation management (mainly based on slab ).

Ing between physical and virtual addresses

1. Concepts

Physical address)

It is used for memory chip-level unit addressing, which corresponds to the address bus connected to the processor and CPU. -- This concept should be better understood among these concepts, but it is worth mentioning that although physical addresses can be directly understood as the memory itself inserted on the machine, consider the memory as a large array of serial numbers from 0 bytes to the maximum number of NULL bytes, and then call this array a physical address. But in fact, this is just a figure provided by hardware to the software, the memory addressing method is not like this. Therefore, it is more appropriate to say that it is "corresponding to the address bus", but aside from considerations of the physical memory addressing method, it directly maps the physical address to the physical memory one by one, it is also acceptable. Maybe wrong understanding is more conducive to the metaphysical image extraction.

Virtual Memory)

This is the image description of the entire memory (do not insert the upper number with the machine. It is relative to the physical memory, and can be directly understood as "not straight", "fake" memory, for example, a 0x08000000 memory address, it is not the address element of 0x08000000-1 in the large array on the physical address;

This is because the modern operating system provides a kind of memory management image, namely virtual memory ). The process uses the address in the virtual memory, and the operating system assists the relevant hardware to "convert" it into a real physical address. This "conversion" is the key to all discussions. With this image, a program can use a much larger address space than a real physical address. (The east wall, the west wall, and the bank do the same), and even multiple processes can use the same address. It's not surprising because the converted physical addresses are not the same. You can decompile the connected program and check that the connector has allocated an address to the program. For example, to call a function A, the code is not a call
A, but call0x08111111. That is to say, the address of function a has been fixed. Without such a "Conversion", there is no virtual address concept, and this is simply not feasible.

The problem persists.

Logical Address)

Intel retains the segmented memory management methods of the ancient times for compatibility purposes. A logical address refers to the address in a machine language instruction that is used to specify an operand or an instruction. In the above example, the address 0x08111111 allocated by the connector for a is the logical address. -Sorry, it seems that it violates Intel's middle-stage management requirements for logical addresses, "a logical address, it is expressed as [segment identifier: Intra-segment offset] by adding an offset of the relative address in the specified segment to a segment identifier, that is, the 0x08111111 in the preceding example, it should be represented as [A code segment identifier: 0x08111111], so that it is complete"

Linear address or virtual address)

Similar to the logical address, it is also an invalid address. If the logical address is the address before the segment Management Switch on the hardware platform, the linear address corresponds to the address before the hardware page memory conversion.

The CPU needs to take two steps to convert the addresses in a virtual memory space to physical addresses: first, we need to give a logical address (in fact, it is a segment offset. This must be understood !!!), The CPU needs to use its segmented Memory Management Unit to convert a logical address into a thread address, and then use its webpage Memory Management Unit to convert it to the final physical address. The two conversions are indeed very troublesome and unnecessary, because linear addresses can be directly extracted to the process. The reason for this redundancy is that intel is completely compatible.

　　2. CPU memory management, how to convert logical addresses to linear addresses

A logical address is composed of two parts. The segment identifier is the intra-segment offset. A segment identifier is a 16-bit long field, which is called a segment selector. The first 13 digits are an index number. The last three digits contain some hardware details,

The last two items involve permission checks, which are not included in this post.

Index number, or directly understood as an array subscript -- it always corresponds to an array. What is its index? This is "segment descriptor (segment descriptor)". Haha, the specific address of the segment descriptor describes a segment (I think of it as an understanding of the word "segment, take a knife and cut the virtual memory into several blocks ). In this way, many segment descriptors are grouped into an array called the "segment descriptor table". In this way, the first 13 digits of the segment identifier can be used, find a specific segment descriptor directly in the segment descriptor table. This descriptor describes a segment. The image of the segment is not accurate just now, let's look at what exactly exists in the descriptor-that is, how it describes it, so we can understand what exactly the segment has and how each segment is illustrated.
It consists of 8 bytes, for example:

These things are very complicated. Although we can use a Data Structure to define them, I only care about the same here, that is, the base field, which describes the linear address at the beginning of a segment.

Intel designed some global segment descriptors to be placed in the global segment descriptor table (gdt). Some local descriptors, such as, put it in the so-called "local segment descriptor table (LDT. So when should we use gdt and LDT? This is represented by the T1 field in the segment selection operator, = 0, indicating gdt, = 1 indicating LDT.

The gdt address and size in the memory are stored in the GDTR control register of the CPU, while the LDT is in the ldtr register. Many concepts, like tongue twisters. This figure looks more intuitive:

First, give a complete Logical Address [segment selector: Intra-segment offset address],

1. Check whether the segment selection operator T1 is 0 or 1. Check whether the CIDR block to be converted is in gdt or in LDT. Then, based on the corresponding register, obtain the address and size. We have an array.

2. Extract the first 13 digits of the segment selection character. You can find the corresponding segment descriptor in this array. In this way, the base address is known.

3. Set base + offset to the linear address to be converted.

It is quite simple. In terms of software, in principle, you need to prepare the information required for hardware conversion so that the hardware can complete the conversion. OK. Let's see how Linux works.

　　3. Linux segment Management

Intel requires two conversions. Although this is compatible, it is redundant. Oh, no way. If the hardware requires this, the software can only do the same thing. It is also formalistic.

On the other hand, some other hardware platforms do not have the concept of secondary conversion. Linux also needs to provide a high-level image to provide a unified interface. Therefore, the Linux segment management is actually just a "scam" of hardware. According to Intel's intention, gdt is used globally, and each process uses LDT itself-but Linux uses the same segment for all processes to address commands and data. That is, the user data segment, the user code segment, corresponds to the kernel data segment and the kernel code segment. There is no such thing as taking the form, just like writing a year-end summary.
Include/asm-i386/segment. h

# Define gdt_entry_default_user_cs 14
# DEFINE _ user_cs (gdt_entry_default_user_cs * 8 + 3)
# Define gdt_entry_default_user_ds 15
# DEFINE _ user_ds (gdt_entry_default_user_ds * 8 + 3)
# Define gdt_entry_kernel_base 12
# Define gdt_entry_kernel_cs (gdt_entry_kernel_base + 0)
# DEFINE _ kernel_cs (gdt_entry_kernel_cs * 8)
# Define gdt_entry_kernel_ds (gdt_entry_kernel_base + 1)
# DEFINE _ kernel_ds (gdt_entry_kernel_ds * 8)

Replace the macro with a numeric value:

# DEFINE _ user_cs 115 [00000000 1110 0 11]
# DEFINE _ user_ds 123 [00000000 1111 0 11]
# DEFINE _ kernel_cs 96 [00000000 1100 0 00]
# DEFINE _ kernel_ds 104 [00000000 1101 0 00]

Square brackets are the 16-bit binary representation of the four segment delimiters. Their index numbers and T1 field values can also be calculated.

_ User_cs Index = 14 T1 = 0
_ User_ds Index = 15 T1 = 0
_ Kernel_cs Index = 12 T1 = 0
_ Kernel_ds Index = 13 T1 = 0

If T1 is 0, gdt is used. Check the corresponding 12-15 items (ARCH/i386/head. s) in the initialized gdt ):

. Quad0x00cf9a000000ffff/* 0x60 kernel 4 gbcode at 0x00000000 */
. Quad0x00cf92000000ffff/* 0x68 kernel 4 gbdata at 0x00000000 */
. Quad0x00cffa000000ffff/* 0x73 user 4 gbcode at 0x00000000 */
. Quad0x00cff2000000ffff/* 0x7b user 4 gbdata at 0x00000000 */

According to the descriptions in the previous segment descriptor table, you can expand them and find that the 16-31 bits are all 0, that is, the base address of the four segments is all 0.

In this way, given an intra-segment offset address, according to the preceding conversion formula, the offset in the 0 + segment can be converted to a linear address, and an important conclusion can be drawn: "in Linux, the logical address and the linear address are always the same (they are the same, not some people say the same), that is, the offset field value of the Logical Address is always the same as the linear address value .!!!"

Too many details are ignored, such as segment permission check. Haha. In Linux, LDT is not used in most processes unless wine is used to simulate Windows programs.

　　4. Page memory management of CPU

The page Memory Management Unit of the CPU is responsible for translating a linear address into a physical address. From the perspective of management and efficiency, linear addresses are divided into groups with a fixed length, called pages. For example, a 32-bit machine can provide a maximum of 4 GB linear addresses, 4 kb can be used as a page. on this page, the entire linear address is divided into a large array of tatol_page [2 ^ 20], with a total of 20 to the power of 2 pages. This large array is called a page Directory. Each directory item in the directory is an address-the address of the corresponding page.

Another type of "page" is called a physical page, or a page box or page layout. It is a paging unit that divides all the physical memory into a fixed-length Management Unit. Its length is generally one-to-one correspondence with the Memory Page. Note that the total_page array has 2 ^ 20 members, each of which is an address (32-bit host, and one address is 4 bytes ), to represent such a number group, it takes up 4 MB of memory space. To save space, a second-level management mode machine is introduced to organize paging units. The text description is too tired to look at the image intuitively:

For example,

1. In the paging unit, the page Directory is unique, and its address is placed in the CR 3 register of the CPU, which is the start point of address conversion. The long journey began.

2. Every active process has its own virtual memory (the page Directory is also unique), which corresponds to an independent page Directory address. -- To run a process, you need to put its page Directory address in the register of and save the other addresses.

3. Each 32-bit linear address is divided into three parts. Surface Directory Index (10 bits): page table index (10 bits): offset (12 bits)

Perform conversion by following these steps:

1. Retrieve the page Directory address of the process from the process (the operating system is responsible for loading the address into the corresponding register when scheduling the process );

2. Find the corresponding index item in the Array Based on the top 10 linear addresses. Because the second-level management mode is introduced, items in the page Directory are no longer page addresses, the address of a page table. (An array is introduced), and the page address is put into the page table.

3. Locate the start address of the page in the page table (also an array) based on the middle 10 of the linear address;

4. Add the start address of the page and the last 12 digits of the linear address to get the final hoist we want;

This conversion process should be very simple. All of them are completed by hardware. Although there is one more procedure, it is worthwhile to save a lot of memory. Then let's simply verify:

1. Can this second-level mode still represent 4G addresses? There are 2 ^ 10 page directories, that is, there are so many page tables. each table corresponds to 2 ^ 10 pages;
Addressable on each page: 2 ^ 12 bytes. Or 2 ^ 32 = 4 GB

2. Does this second-level mode actually save space? That is, the total space occupied by directory items and page table items on the next page is calculated (2 ^ 10*4 + 2 ^ 10*4) = 8 KB. Ah ,...... How can this problem be solved !!!
Red error. Mark it, which is discussed later ...... According to the explanation in <in-depth understanding of computer systems>, the second-level mode space saving is achieved in two aspects:

A. If a page table entry in the level-1 page table is empty, the level-2 page table does not exist at all. This shows a huge potential savings, because for a typical program, most of the 4 GB virtual address space will be unallocated;

B. Only the first-level page table must always be in the primary storage. The virtual memory system can be created as needed and transferred to or out of the second-level page table on the page, which reduces the pressure on the primary storage. Only the second-level page tables that are most frequently used need to be cached in the primary storage. -- However, Linux does not fully enjoy this benefit. Its page table directory and the page table related to the allocated page are all resident memory. It is worth mentioning that although the items in the page Directory and page table are both 4 bytes and 32 bits, they only use 20 bits in height, the 12-bit low block is 0. It is easy to understand that the 12-bit Low Block of the page table is 0, because in this way, it corresponds to a page size and everyone is an integer. It is much easier to compute. However, why does the page Directory still need to be shielded with 12 characters lower? Because, in the same way, you only need to block the lower 10 digits. No
I thought, because 12> 10, this allows the page Directory and page table to use the same data structure for convenience.

This post only introduces the principle of general conversion. It will not be too long to extend paging, page protection mechanism, and PAE mode paging ...... You can refer to other professional books.

　　5. Linux page-Based Memory Management

In principle, Linux only needs to allocate the required data structure to each process and put it in the memory. Then, when the process is scheduled, it switches to the Register c3.3, the rest is handed over to the hardware (haha, in fact it is much more complicated, but I only analyze the most basic process ). I mentioned above the i386 second-level page management architecture. However, some CPUs have third-level or even fourth-level architecture. Linux provides a unified interface for each CPU to provide images at a higher level. Provides a layer-4 page management architecture to be compatible with the CPU of these level-2, level-3, and level-4 Management architectures. The four levels are:

Page global directory PGD (corresponding to the page Directory just now)

Page parent directory pud (new)

Page center Directory (new)

Page table Pt (corresponding to the page table just now ).

Based on the hardware conversion principle, the entire conversion only requires secondary array indexes, such:

So how can I coordinate the 32-bit hardware that uses the second-level management architecture with level-4 conversion? Well, let's see how to divide linear addresses in this case! From the hardware point of view, the 32-bit address is divided into three parts-that is to say, the three leaders are only known when the software is not managed and finally implemented to the hardware.

From the software perspective, two parts are introduced, that is, there are five parts. -- It is easy to understand the hardware of the layer-2 architecture. When dividing the addresses, set the length of the upper-level directory on the page and the middle directory on the page to 0. In this way, the operating system sees five parts, and the hardware is still divided by the rigid three parts, and there will be no errors, that is to say, we have built a harmonious computer system.

In this way, we will not set the two in the middle to 0 if we use a 64-bit address and a layer-4 CPU conversion architecture, software and hardware are in harmony again-the image is powerful !!!

For example, a logical address has been converted to a linear address, 0x08147258, and is converted to a binary address, that is:

0000100000 0101000111 001001011000

The kernel divides the address,

Pgd= 0000100000
Pud = 0
PMD = 0
PT = 1, 0101000111
Offset = 001001011000

Now we can understand Linux's Hardware-targeted tricks, because the hardware does not see the so-called pud, PMD, so in essence, the PGD index is required to directly correspond to the PT address. Instead of going to pud and PMD to check the Array (although the two of them are in a linear address, the length is 0, 2 ^ 0 = 1, that is, they are all arrays with an array element ), then, how can we reasonably arrange the address of the kernel?

From the software point of view, because the item has only one 32-bit, it can store the address pointer with the same length as that in PGD. So the so-called first pud, to do the ing conversion to the PMD, it becomes to keep the original value unchanged, one by one can be changed. In this way, "logically pointing to a pud and then to a PDM, but physically pointing to the corresponding PT image, because hardware does not know pud or PMD ". Then hand it over to the hardware. The hardware divides the address and you can see the following:

Page Directory = 0000100000

PT = 1, 0101000111

Offset = 001001011000

Well, first index in the page Directory Array Based on 0000100000 (32), find the address in its element, get its 20-bit height, and find the address of the page table, the address of the page table is dynamically allocated by the kernel. Then, the final physical address is added with an offset.

Kernel Memory Allocation Management

The memory management method should implement the following two functions:

Minimum time required for Memory Management
Maximize available memory for general applications (minimize management overhead)

1. Direct heap allocation

Each Memory Manager uses a heap-based allocation policy. In this method, large memory (calledHeapIs used to provide memory for user-defined purposes. When you need a piece of memory, you are requested to allocate a certain size of memory to yourself. The heap Manager checks the available memory (using specific algorithms) and returns a piece of memory. Some algorithms used in the search process include:First-fit(The first memory block found in the heap that meets the request) andBest-fit(Use the most suitable memory block in the heap that meets the request ). After the user uses the memory, the memory is returned to the heap.

The root problem of this heap-based allocation policy isFragmentation). When memory blocks are allocated, they are returned at different times in different order. This will leave some holes in the heap, and it takes some time to effectively manage the idle memory. This algorithm usually has a high memory usage efficiency (allocating the required memory), but it takes more time to manage the heap.

2. Partner Allocation Algorithm

Another method is calledBuddy Memory AllocationIs a faster memory allocation technology, which divides the memory into two power-to-power partitions and uses the best-fit method to allocate memory requests. When a user releases the memory, the user checks the buddy block to check whether the adjacent memory block has been released. If yes, memory blocks are merged to minimize memory fragments. The time efficiency of this algorithm is higher, but the use of the best-fit method results in a waste of memory.

3. Slab

There are many documents about slab splitters. Simply put, the kernel often applies for a fixed size

Some memory space, which is generally a struct. These structs usually have a common initialization behavior, such as semaphores, linked list pointers, and members in initialization. Sun's big bull jeffbonwick study found that the kernel's initialization of these struct takes longer than the time to allocate them. So he designed an algorithm. When the space of these structs is released, he just returned to the allocated State and did not actually release it, the next time you apply, you can save the initialization time. The entire process can be understood as the process of borrowing a whiteboard. The requested space is to borrow multiple whiteboards from others. Because each whiteboard has different functions, you must first draw different forms on different whiteboards and then fill in the content. If the general algorithm is to use the whiteboard, it will be directly returned to others. I will borrow it next time and then draw the table. The optimized algorithm is to temporarily restore users after they use it. When they use the whiteboard, they can take a whiteboard and redraw the table. Slab
The algorithm is to erase the table content and leave the table without a whiteboard. The right Whiteboard will be retrieved based on the purpose of the next use, because the form is ready-made and can be filled in directly. Save the two operations of borrowing a whiteboard and drawing a table.

I. Basic slab distributor views

* The slab splitter regards the memory area as an object and divides the main memory area containing the high-speed cache into multiple slab;

* The slab splitter groups objects into high-speed caches. Each cache is a "reserve" of the same type of objects ";

* Each slab consists of one or more consecutive page boxes, which contain allocated objects and idle objects;

* The slab distributor is displayed on the distribution page of the partner system.

Ii. Advantages of slab cache distributor

1) The kernel usually relies on the allocation of small objects, which will be allocated countless times in the system lifecycle. The slab cache distributor provides this function by caching objects of similar sizes, thus avoiding common fragmentation problems. 2) Slab allocator also supports initialization of common objects, thus avoiding repeated initialization of an object for the same purpose. 3) slab alignment and coloring Support Hardware cache, which allows different cached objects to occupy the same cache row, thus improving the cache utilization and improving performance.

The malloc in the uClibc-0.9.28 can call MMAP to connect to the physical address, or through the sbrk, to associate with memory management between the kernel, speculation may also pass through slab.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux memory management

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux memory management

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support