About Linux Memory management

Source: Internet
Author: User

Linux Memory management is mainly divided into two parts: Physical address to virtual address mapping, kernel memory allocation management (mainly based on slab).

Mapping between physical addresses to virtual addresses

1. Concept

Physical addresses (physical address)

For memory chip-level unit addressing, corresponding to the processor and CPU-connected address bus. This concept should be the best understanding of these concepts, but it is worth mentioning that although the physical address can be directly understood into the machine on the memory itself, the memory as a large array from 0 bytes to a maximum empty byte by bit number, and then the array is called the physical address, but in fact, This is just a hardware-to-software image, and the memory is not addressed in this way. So, to say that it is "corresponding to the address bus", is more appropriate, just throw aside the physical memory addressing mode of consideration, the physical address directly to the physical memory one by one corresponding, is also acceptable. Perhaps the wrong understanding is more favourable to the metaphysical image.

Virtual Memory (Vsan)

This is a description of the entire memory (not the one that is inserted on the machine). It is relative to the physical memory, can be directly understood as "not straight", "false" memory, for example, a 0x08000000 memory address, it is not correct on the physical address of the large array 0x08000000-1 that address element;

This is because modern operating systems provide a memory-managed image, virtual memory. The process uses the address in virtual memory, which is assisted by the operating system to "transform" it into a real physical address. This "conversion" is the key to all the issues discussed. With this image, a program can use a much larger address space than the real physical address. (Rob Peter, pay Paul, banks do the same), and even multiple processes can use the same address. Not surprisingly, since the converted physical address is not the same. The ability to decompile the connected program to see that the connector has assigned an address to the program, for example, to invoke a function A, the code is not call a, but call0x0811111111, that is, the address of function A has been fixed. Without this "conversion", there is no concept of virtual address, which is simply not feasible.

Hold on, the question goes on again, and it won't hold up.

Logical addresses (logical address)

For compatibility, Intel retains the ancient memory management of the segment. A logical address is a machine language instruction used to specify an operand or an address of an instruction. In the example above, we say that the 0x08111111 of the connector is the logical address of the assigned address. --just embarrassed, so to speak, as if to violate the Intel Middle-management, the logical address requirements, "a logical address, is a segment identifier plus a specified paragraph within the offset of the relative address, expressed as a [segment identifier: Intra-paragraph offset], that is, the 0x08111111 in the above example, should be represented as [a code snippet identifier: 0x08111111], so that it is complete "

Linear address (linear addresses) or also called virtual address (Vsan)

Similar to the logical address, it is also an unreal address, assuming that the logical address is the corresponding hardware platform segment management conversion before the address, then the linear address corresponding to the hardware page memory of the pre-conversion address.

The CPU converts an address in a virtual memory space into a physical address, which requires two steps: First, given a logical address (in fact, the offset within the paragraph, this must be understood!!!). ), the CPU uses its segment memory management unit to convert a logical address into a thread address, and then use its page-based memory management unit to convert to a physical address finally. Doing this two conversions is really cumbersome and unnecessary, as it is straightforward to draw a linear address to the process. The reason for this redundancy is that Intel is completely compatible.

  2. CPU segment memory management, how to convert logical address to linear address

A logical address consists of two parts, the segment identifier: The offset within the segment. A segment identifier is made up of a 16-bit long field called a segment selector. The top 13 bits are an index. The back 3 bits include some hardware details,

The last two bits involve permission checks, which are not included in this post.

Index, or direct understanding of the array subscript-then it always has to be a corresponding arrays, it is what the index of what? This east is "paragraph descriptive narrative (segment descriptor)", hehe, the description of the narrative description of the description of a paragraph (for the "paragraph" of the word understanding, I was to think of it, took a knife, the virtual memory, cut into a number of sections--). In this way, a very multi-paragraph descriptive narrative, the group of an array, called "Paragraph descriptive narrative table", so that the first 13 bits of the paragraph identifier can be found directly in the paragraph descriptive Descriptor table to find a detailed paragraph description of the narrative, the descriptor describes a paragraph, I just to the paragraph of the image is not very accurate, Since we look at what is in the description of the narrative-that is, how it is described, we understand what the paragraph is, and each segment descriptor consists of 8 bytes, for example:

These things are very complex, although it is possible to use a data structure to define it, but just as I care here, it is the base field, which describes the linear address where the beginning of a segment is described.

Intel designed the idea is that some of the global segment descriptive narrative, placed in the "Global segment Descriptive Descriptor (GDT)", some local, such as each process of its own, is placed in the so-called "local section Descriptive narrative list (LDT)." When should we use GDT, when should we use the LDT? This is represented by the T1 field in the segment selector, = 0, which indicates the use of the LDT with Gdt,=1.

The address and size of the GDT in memory are stored in the GDTR control register of the CPU, while the LDT is in the LDTR register. A lot of concepts, like tongue twisters. This picture looks more intuitive:

First, given a complete logical address [segment selector: offset address within paragraph],

1, see segment selector t1=0 or 1, know the current to convert is a GDT in the segment, or the LDT in the section, and then according to the corresponding register, get its address and size. We've got an array.

2, take out the segment selector in the first 13 bits, in this array, find the corresponding paragraph descriptive descriptor, so that it is base, that is, the base address to know.

3, the base + offset, is to convert the linear address.

is quite simple, for the software, in principle, the need for hardware conversion required information ready, you can let the hardware to complete the conversion. OK, let's see what Linux does.

  3. Linux Segment Management

Intel requires two conversions, which is compatible, but it is very redundant, oh, no way, hardware requirements to do so, the software can only be complied with, how to have the same formalism.

On the one hand, some other hardware platforms, without the concept of two conversions, Linux also needs to provide a high-level image to provide a unified interface. So, the Linux segment management, in fact, is only "coaxing" a bit of hardware. In accordance with Intel's original intent, the global use of GDT, each process of its own with ldt--just Linux is used in the whole process of the same field for instruction and data addressing. That is, the user data segment, the user code snippet, corresponding, the kernel is the kernel data segment and the kernel code snippet. There is nothing strange about this, it is formality, as we write the year-end summary.
Include/asm-i386/segment.h

#define GDT_ENTRY_DEFAULT_USER_CS 14
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS * 8 + 3)
#define GDT_ENTRY_DEFAULT_USER_DS 15
#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS * 8 + 3)
#define Gdt_entry_kernel_base 12
#define GDT_ENTRY_KERNEL_CS (gdt_entry_kernel_base + 0)
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8)
#define GDT_ENTRY_KERNEL_DS (gdt_entry_kernel_base + 1)
#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8)

To replace the macro with a value:

#define __USER_CS 115 [00000000 1110 0 11]
#define __USER_DS 123 [00000000 1111 0 11]
#define __KERNEL_CS 96 [00000000 1100 0 00]
#define __KERNEL_DS 104 [00000000 1101 0 00]

The square brackets are the 16-bit two representations of the four segment selectors, and their index numbers and T1 field values can also be calculated.

__user_cs index= t1=0
__user_ds index= t1=0
__kernel_cs index= t1=0
__kernel_ds index= t1=0

T1 are 0, which means that the GDT is used, and then the corresponding 12-15 items (arch/i386/head) in the content of the initialized GDT are seen. S):

. QUAD0X00CF9A000000FFFF/* 0x60 kernel 4GBcode at 0x00000000 */
. QUAD0X00CF92000000FFFF/* 0x68 kernel 4GBdata at 0x00000000 */
. QUAD0X00CFFA000000FFFF/* 0x73 user 4GBcode at 0x00000000 */
. QUAD0X00CFF2000000FFFF/* 0x7b user 4GBdata at 0x00000000 */

According to the descriptive narrative in the narrative of the preceding paragraph, it is possible to unfold them and find that the 16-31 bits are all 0, that is, the base site of four segments is 0.

Thus, given a paragraph offset address, according to the previous conversion formula, 0 + paragraph offset, converted to a linear address, can come to an important conclusion, "under Linux, the logical address and linear address always consistent (is consistent, not some people say the same), That is, the value of the offset field of the logical address is always the same as the value of the linear address. ”

Too many details, such as the permission check for a segment, are ignored. Oh. In Linux, most processes do not use the LDT, unless you are using wine to simulate Windows programs.

  4.CPU of page-memory management

The CPU's page-memory management unit, which is responsible for putting a linear address, is finally translated into a physical address. From the management and efficiency point of view, the linear address is divided into fixed-length units, called pages (page), such as a 32-bit machine, the linear address can be up to 4G, can be divided into a page of 4KB, this page, the entire linear address is divided into a tatol_page[2^20] A large array of 2 of the 20-second pages. This large array we call the page folder. Each folder item in the folder is an address-the address of the corresponding page.

There is also a kind of "page", which we call a physical page, or a page box, a page frame. The paging unit divides the entire physical memory into a fixed-length management unit whose length is typically one by one corresponding to the memory page. Note here that this total_page array has 2^20 members, each member is an address (32-bit machine, an address is 4 bytes), then to represent such an array, it takes up 4MB of memory space. To save space, a two-level management-mode machine was introduced to organize paging units. The description of the text is too tired to look at the image of some intuitive:

Such as

1, paging Unit, the page folder is unique, its address in the CPU's CR3 register, is the start point of the address translation. The long march began to grow.

2, the process of each activity, because all have its own corresponding virtual memory (page folder is also unique), then it also corresponding to a separate page folder address. --to execute a process, you need to put its page folder address in the CR3 register, save the other one.

3, each 32-bit linear address is divided into three parts, face folder index (10-bit): Page table index (10-bit): offset (12-bit)

Follow the steps below to convert:

1, remove the process from the CR3 page folder address (the operating system is responsible for the scheduling process, the address is loaded into the corresponding register);

2, according to the first ten linear address, in the array, find the corresponding index entry, due to the introduction of the two-level management mode, the page folder, the item is no longer the address of the page, but a page table address. (An array is introduced, and the address of the page is placed in the page table.)

3, according to the middle of the linear address 10 bits, in the page table (also array) to find the starting address of the page;

4, add the beginning address of the page and the last 12 bits of the linear address, get finally the gourd we want;

This conversion process, it should be said to be very easy. All the hardware is finished, although a lot of formalities, but save a lot of memory, or worthwhile. So, simply verify that:

1, whether this two-level mode can still represent the address of 4G; page Folders co-owned: 2^10 items, that is, there are so many page tables. Each eye list corresponds to: 2^10 page;
Addressable in each of the pages: 2^12 bytes. or 2^32 = 4GB

2, whether this two-level mode really saves space; that is to say, the page folder item and page table item occupy space (2^10 + 2 ^10) = 8KB. Hey...... How to say it!!!
Red error, mark it, after this discussion in the post ... By < in-depth understanding of computer systems > Interpretation, two-level mode space savings are achieved from two aspects:

A, assuming that a page table entry in a page table is empty, then the two-page table that refers to it does not exist at all. This shows a huge potential savings, because for a typical program, the majority of the 4GB virtual address space will be unassigned;

B, only one level of page table is always in main memory. The virtual memory system can be created when needed, and the page is paged into or out of the Level two page table, which reduces the pressure of main memory. Only the most commonly used level two page table needs to be cached in main memory. Linux does not fully enjoy this benefit, its page Table folder and the page table associated with the assigned page are resident memory. It is worth mentioning that, although the page folder and the page table items are 4 bytes, 32 bits, but they are only high 20 bits, low 12-bit shielding for 0--to the page table of the low 12 screen 0, is very good to understand, because of this, it just and a page size corresponding, everyone into an integer to add? It's a lot easier to calculate. But why do you want to block the page folder at the same time by 12-bit down? Because in the same way, only to block its low 10-bit can be, but I think, because of 12>10, so that can make the Page folder and page table using the same data structure, convenient.

This post only introduces the principle of general conversion, extended paging, page protection mechanism, PAE mode paging these troublesome things will not be wordy ... Be able to participate in other professional books.

  5.Linux of page-memory management

In principle, Linux just need to allocate the required data structure for each process, put in memory, and then in the scheduling process, the switch register CR3, the rest of the hardware to complete (hehe, in fact, it is much more complex, but I only analysis the main process). In front of the i386 two-level page management architecture, just some CPUs, there are three levels, or even four-tier architecture, Linux in order to provide a higher level of the image, for each CPU to provide a unified interface. Provides a four-layer page management architecture that is compatible with these two-, three-, and four-level management architecture CPUs. These four levels are:

Page Global folder PGD (corresponding to the page folder just now)

Page parent folder PUD (newly introduced)

Page Intermediate folder PMD (also newly introduced)

Page Table pt (corresponding to the page table just now).

The whole conversion is based on the principle of hardware conversion, just two more times the index of the array, for example:

So how can they work in a coordinated way for hardware that uses a two-level management architecture with 32 bits and is now a four-level conversion? Well, look at this situation, how to divide the linear address it! From the hardware point of view, the 32-bit address is divided into three parts-that is, regardless of how the software, and finally implemented to the hardware, but also only know the three boss.

From the software point of view, because the introduction of more than two parts, that is to say, together have five parts. --to make the two-tier architecture hardware understanding five parts is also very easy, in the Address Division, the page ancestor folder and the Page Intermediate folder length is set to 0. In this way, the operating system to see is five parts, hardware or its rigid three parts division, will not be wrong, that is to say, we build a harmonious computer system.

In this way, although superfluous, but considering the 64-bit address, using the four-layer conversion architecture of the CPU, we will no longer set the middle two to 0, so that the software and hardware again harmony--the smoke like is strong AH!!!

For example, a logical address has been converted to a linear address, 0x08147258, and replaced by two, which is:

0000100000 0101000111 001001011000

The kernel divides this address,

PGD = 0000100000
PUD = 0
PMD = 0
PT = 0101000111
offset = 001001011000

Now to understand the Linux for hardware tricks, because the hardware can not see so-called PUD,PMD, so, essentially requires PGD index, directly corresponding to the PT address. Instead of checking the array in PUD and PMD (although they are both in linear addresses, with lengths of 0,2^0 = 1, which means that they all have an array of array elements), how does the kernel properly arrange the addresses?

From the software point of view, because its entry has only one, 32 bits, just can hold the same length as the PGD in the same address pointer. So the so-called first to the PUD, to the PMD mapping conversion, it becomes the original value unchanged, a change of hands can be. In this way, it is realized that "logically points to a pud, and then point to a PDM, but is physically directly pointing to the corresponding PT of this image, because the hardware does not know that there is pud, PMD this thing." Then to the hardware, hardware to this address division, see is:

Page folder = 0000100000

PT = 0101000111

offset = 001001011000

Well, first, according to 0000100000 (32), in the page folder array index, find the address of its elements, take its high 20 bits, find the address of the page table, the address of the page table is dynamically allocated by the kernel, and then add an offset, is the last physical address.



Kernel Memory allocation Management

The memory management approach should implement the following two features:

    • Minimizing the time required to manage memory
    • Maximizes available memory for general applications (minimizes administrative overhead)

1. Direct heap allocation

Each memory manager uses a heap-based allocation policy. In such a method, large chunks of memory (called heaps ) are used to provide memory for user-defined purposes. When the user needs a piece of memory, they are asked to allocate a certain amount of memory to themselves. The heap manager looks at available memory (using a specific algorithm) and returns a piece of memory. Some of the algorithms used in the search process are first-fit(the first memory block found in the heap that satisfies the request) and best-fit(the most appropriate block of memory that satisfies the request in the heap). When the user finishes using memory, the memory is returned to the heap.

The fundamental problem with this heap-based allocation strategy is fragmentation (fragmentation). When memory blocks are allocated, they are returned in different order at different times. This leaves some holes in the heap and takes some time to manage spare memory efficiently. Such algorithms often have high memory efficiency (allocating the required memory), but they need to spend a lot of time managing the heap.

2. Partner Assignment algorithm

The second method, called the buddy memory allocation, is a faster allocation technique that divides memory into 2 power partitions and uses the Best-fit method to allocate memory requests. When the user frees the memory, it checks the buddy block to see if its adjacent block of memory has also been disposed. If yes, the memory blocks will be merged to minimize memory fragmentation. This algorithm is more time efficient, but because of the use of the Best-fit method, memory waste is generated.

3.slab

There are very many documents about the slab allocator. Simply put, the kernel often applies a fixed size

Some memory space, these spaces are generally structural bodies. And these structures tend to have a common initialization behavior analogy: Initialize the inside of the semaphore, linked list pointers, members. Research by Sun's Daniel Jeffbonwick found that the kernel's initialization of these structures takes longer than it takes to allocate them. So he devised an algorithm that, when the space of these structures was released, simply let him go back to the state he had just allocated and not really release, and save the initialization time the next time he applied. The whole process can be understood as the process of borrowing a whiteboard. Application space is to borrow a few white boards from others. Because each whiteboard is useful, each time you use a different whiteboard to draw a different form, and then fill in the content. Suppose the general algorithm is to use the whiteboard, directly back to others, the next time to use the time to borrow back and then draw a good table. Optimization of the algorithm is to use the temporary after the end of the family, people want to use the time again, the second time to use a whiteboard to take a whiteboard and draw a table again. The use of the slab algorithm is not the whiteboard when the content of the erase table to leave the table, the whiteboard is also temporary not to others. The next time you need to use the correct whiteboard according to the use, because the table is ready to fill the content directly inside can be. This eliminates the two operations of borrowing whiteboards and drawing tables.


First, the basic point of view of slab distributor

* The slab allocator considers the memory area as an object, and divides the main memory partition including the fast cache into multiple slab;

* The slab allocator puts objects into the fast cache according to their type, and each fast cache is a "reserve" for the same type of object;

* Each slab consists of one or more contiguous page boxes, which include the assigned objects, including the spare objects;

* The Slab allocator assigns page boxes through the partner system.

Second, the advantages of slab buffer Allocator

1), the kernel usually relies on the allocation of small objects, which will be allocated countless times during the system life cycle. The slab cache allocator provides such functionality by caching objects of similar size, thus avoiding common fragmentation problems. 2), the slab allocator also supports initialization of common objects, thus avoiding the repeated initialization of an object for the same purpose. 3), the slab allocator can also support hardware cache alignment and shading, which allows objects in different caches to consume the same cache rows, thus increasing the utilization of the cache and obtaining better performance.



malloc in uClibc-0.9.28 can call Mmap, which can be associated with a physical address, or through SBRK, which can be linked to memory management between cores, and may also pass slab.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.