Introduction to Linux Kernel Engineering--memory management (i)

Source: Internet
Author: User

Linux Memory Management Overview Physical Address management

Many small operating systems, such as ecos,vxworks, are embedded systems where the address used in the program is the actual physical address. The physical address referred to here is the address that the CPU can see, as to how this address maps to the physical space of the CPU, where it maps to, depending on the type of CPU (such as MIPS or ARM), which is usually done by hardware. For software, the CPU can see a physical address at startup. But generally larger than the embedded system, just start to see the address that has been mapped to the CPU space is not all available addresses, need to use software to map the available physical storage resources to the CPU address space.

Typically, CPU-visible addresses are limited, with 32-bit CPUs seeing up to 4G of physical space and 64-bit larger. So the current application of the 64-bit may not need to consider physical memory CPU visible physical space problems, but the basic 32-bit is to be considered. This is the birth of a requirement: dynamic mapping.

In Linux systems, such as the x86 architecture, due to the CPU visible 3G space to the user program, the kernel left only 1G, and the stored mappings are mapped to this 1G, so more than 1G of memory is not practical dynamic mapping is inaccessible.

Simply put, it is when a blank memory page is required to dynamically map a physical memory to an address, and then need to replace the already used remap new to this address.

Application Address space Isolation

Another requirement is that modern systems typically run more than one or two programs, and each program can see and manipulate the full address, so installing someone else's release is a risky operation. The embedded system is easy to handle, but the PC is difficult to deal with this problem. Therefore, it is necessary for each program to isolate the address space that is visible to the program. Then there is the virtual program address space. Each process sees the same range of addresses, but the data returned by its access to the same address is different.

Request and Release memory

Both the user program and the kernel program need to use memory, so how to efficiently allocate and reclaim memory is a very important topic.

The actual requirement is that the user can request memory, but the requested memory is not necessarily used, so the kernel can also not really reserve memory for it, but only when it is actually used to allocate. This kernel mechanism is called over_commit, which means that the kernel can allocate more than the actual amount of memory that the application has.

The Linux kernel uses a large amount of space to cache the files on the disk, which will consume almost all of the available memory. When the user program is internally stored, Linux reclaims part of the memory to meet user needs. So, in the eyes of a Linux program, the available memory for Linux systems is almost always 0, but it is often successful to apply for memory.

These memory mechanisms, which are designed for the requirements of each function, form the memory management mechanism of Linux. It is meaningless to leave the memory management mechanism of the specific function.

Thus, memory management has three main requirements: dynamic physical memory management, isolated user address space management and allocation, and reclaim memory.

Source code file Structure

Memory-related, both the infrastructure and the code that serves the specific requirements are located in the MM directory. The associated header file is located under INCLUDE/MM. With the development of the Linux kernel, the basic functions have been compacted and the auxiliary functions are becoming more and more. For example, Kasan is used for memory error bounds checking, and so on.

Memory Organization Mode X86 organization

The Linux kernel memory is in pages, but the whole is organized as a zone. A total of 3 zone,dma, normal and high. DMA is due to the hardware architecture of the DMA can only access a portion of the address (such as Intel's DMA can only access the low 16M address), some systems can be more than the physical memory of the CPU visible memory space, such as 32-bit CPU for more than 4G of memory can not be fully static mapping. But due to the virtual memory mechanism of Linux, all the space the kernel can use is only 1GB (variable in some architectures).

In general, the Linux kernel divides virtual memory (X86, etc.) by a 3:1 ratio: 3 GB of virtual memory for user space and 1GB of memory for kernel space. Of course, some architectures such as MIPS use 2:2 ratio to divide virtual memory: 2 GB of virtual memory for user space, 2 GB of memory for kernel space, and virtual space like arm architecture is configurable (1:3, 2:2, 3:1).

Take x86 as an example, the kernel in Linux uses 3G-4G's linear address space, which means that only 1G of the kernel's address space can be used to map the physical address space. However, if the memory is larger than 1G, the kernel-state linear address is not enough. This kernel introduces a concept of high-end memory, the 1G linear address space divided into two parts: less than 896M of physical address space called low-end memory, the physical address of this part of memory and 3G start linear address is one by one mapping, that is, the kernel uses a linear address space (VA) 3g--(3G +896m) and Physical address space (PA) 0-896m One by one corresponds, page_offset=0xc0000000; the remaining 128M linear space is used to map the remaining physical address space greater than 896M, which is what we usually call the high-end memory area, This part of the space requires the MMU to establish a dynamic mapping relationship through the TLB table.

That is, in Linux under the x86 32-bit system, the real can be statically mapped memory only 896MB. High-end memory is required when memory is greater than 1G, otherwise memory greater than 1G will not be available. So three memory zone, the first 16MB corresponding to the kernel space 0-16mb,normal area corresponding to the 16-896mb,high area corresponding to the 896-1g dynamic zone, the usable size is actually variable. From here we can see that if you do not need the DMA zone (DMA unlimited), the area can be deleted, if the memory is not more than 896mb,highmem area can also be deleted.

Because the kernel is allocated in 3 zones in response to the request allocation space, the priority is normal, and 3 zones are reclaimed at the time of collection. If you can remove a zone, you can save a lot of execution costs for many memory operations.

MIPS organization

The HIGHMEM management of MIPS can be found in its official profile: Http://www.linux-mips.org/wiki/Highmem

Memory windows in the MIPS32 CPU that are not converted by the MMU are only 512MB of the size of kseg0 and KSEG1, and these two memory Windows map to the same physical address space of 0~512m. The rest of the 3G virtual address space needs to be converted into a physical address by the MMU, which is implemented by the CPU vendor. In other words, the physical address space above 512M is accessed under the MIPS32 CPU and must be translated through the MMU address. That is, the space that is mapped by the Va=pa+page_offset formula is only 512M, where page_offset=0x80000000, and MIPS32 in Linux only uses 256MB.

MIPS should pay attention to two problems in the process of using Higmem: one is to consider the balance between the system performance and stability brought by Higmem, and the other is HIGHMEM does not support the cache aliases.

High-end memory

There are three ways of high-end memory mapping:

1. Temporary mapping space

A fixed mapping space is a set of reserved virtual page spaces in a kernel linear space that is at the end of a kernel linear address, which is the highest address part. Its address is determined at compile time for specific purposes (such as Vsyscall system calls, MIPS cache coloring). Determined by the enumeration type fixed_addresses, the kernel is between Fixaddr_start to Fixaddr_top

In this space, there is a subset of temporary mappings for high-end memory. This space has the following characteristics: Each CPU occupies a piece of space, can be used in the interrupt processing function and the inside of the delay function, never block, prohibit kernel preemption, in the space occupied by each CPU, divided into a number of small space, each small space size is 1 page, each small space for a purpose, These purposes are defined in the Km_type in Kmap_types.h.

When you want to do a temporary mapping, you need to specify the purpose of the mapping, according to the mapping purposes, you can find the corresponding small space, and then the address of the space as a map address. This means that a temporary mapping will cause the previous mappings to be overwritten.

Interface functions: Kmap_atomic/kunmap_atomic. Use a physical page from Fix_kmap_begin to Fix_kmap_end

2. Long-term mapping space

The long-mapped address space is a reserved linear address space. A means of accessing high memory. It is used by alloc_page () to obtain the page of the high-end memory, and then the kernel allocates a virtual address from the linear space specifically reserved for this, between Pkmap_base and Fixaddr_start.

Interface functions: Void*kmap (Struct*page), void Kumap (Struct*page)

This interface function can be used in high/low memory, can sleep, the number is limited. For unused page, and should be released from this space (that is, to de-map the relationship).

#definePKMAP_BASE ((Fixaddr_boot_start-page_size * (Last_pkmap + 1)) & Pmd_mask)

#defineLAST_PKMAP 1024

3. Non-contiguous mapped address space

The non-contiguous mapped address space is suitable for cases where memory is not frequently requested, so that the kernel page table is not frequently modified. In general, the kernel uses a non-contiguous mapped address space mainly in the following cases: mapping the I/O space of the device, allocating space for the kernel module, allocating space for the swap partition

Each non-contiguous memory area corresponds to a descriptor of type vm_struct, which is inserted into a vmlist linked list through the next field.

High-end memory is easy to use in this way because, with Vmalloc (), when the "kernel Dynamic mapping Space" is requested for memory, it is possible to obtain a page from high-end memory (see VMALLOC implementation), so that high-end memory may be mapped to "kernel dynamic mapping Space".

Interface functions: Vmalloc (Vfree): Physical memory (call Alloc_page) and linear address apply simultaneously, physical memory is __GFP_HIGHMEM type (allocation order is high, NORMAL, DMA (Visible vmalloc can not only map the Highmem page box, its main purpose is to put fragmented, discontinuous page frames together into a continuous kernel logical address space ...) );

Vmap (VUMAP): A simplified version of Vmalloc;

Ioremap (IOUNMAP): Allocating I/O mapping space;

A simple representation of the Linux kernel's mapping of virtual addresses, where the HIGHMEM zone is used to map high-end memory

Request and Release memory

We know that the memory of modern operating systems is organized by page, and perhaps different systems add concepts such as page groups on top of the page. But the basic unit of kernel memory management is the page. So, the basic thing is how the kernel manages the page.


Application and release of memory at startup: Bootmem

Linux boot-time modules also have the need to request and release memory, but the kernel's memory model is not yet established. Linux then provides a dedicated memory interface Bootmem, which is simple, page-based, simple search to meet the needs of continuous page space allocation, and can deal with physically discontinuous storage.

One of the most widely used techniques of this memory mechanism is the allocation of super-large contiguous memory. Because this requirement is easy to meet before the system starts, but after start-up, due to the large number of modules, memory use frequently change hands, the physical continuous memory is difficult to obtain, at boot time directly through the Bootmem interface reserved continuous physical memory subsequent use is not the second choice.

After the kernel is fully booted, the BOOTMEM mechanism is no longer valid.

Ioremap:early_ioremap at startup

When the IOREMAP call is not ready at startup, an early Ioremap call is provided, called Early_ioremap.

Request DMA-Able memory

Dmapool

Mempool

Request a Memory pool

Cma

Continuous memory allocator. Before this, want to reserve a large block of contiguous memory, basically can only use Bootmem at boot time reserved, so the cost of this reservation is the Linux boot this part of the memory is not available to the kernel. and the memory reserved by the user may not always be used, resulting in low utilization of memory.

Partner algorithms

Memory in the bottom is a page allocation, some of the top allocators such as kernel slab, user control of malloc, etc. are in the background before requesting enough pages before the user on the line allocation. So backstage about how to apply for the page there are many kinds of ideas, the most important evaluation criteria for these ideas are two: How to fastest, how to least fragmented.

The most widely used partner algorithm, the core idea of the algorithm is to divide the memory into a series of different sizes of memory blocks, when the application of memory to return the most close to the memory size of the memory block, not the appropriate size when it may be split larger. Make it as non-fragmented as possible with advanced scheduling, at the expense of memory utilization. The idea was not always valid, and later people added the Recycle Type property of the memory page: recyclable, removable, non-recyclable. The equivalent of a regular disk defragmentation to allow discontinuous blocks of free memory to be contiguous again. Because the user program uses the memory pages are dynamic mapping, so the background only need to replace the mapping to achieve the User program transparent page replacement, so the efficiency of this approach is also good.
In addition to being aware of fragmentation in distribution, the kernel periodically recycles pages that have already been distributed. Reasonable distribution and effective recycling form the core of Linux kernel management.

Slab

There are many common structures in the kernel, and if we use the traditional dynamic allocation based on size, we will search the linked list frequently, and it is obviously more appropriate to use pool thought. Also because there are a lot of common structures, it is not possible to define a pool type for each, reasonable practice should be as common as possible, The Designed structure pool is the slab memory management mechanism.

The slab memory management mechanism is named because it names the memory pool of a struct as slab, and there are multiple slab in the kernel, each of which is a pool of different common structures. To accommodate SMP, let each CPU manage a series of independent slab.

But slab can not adapt to NUMA, slub on the basis of slab to increase the adaptability of NUMA, but also reduced the structure of slab, improve the efficiency of slab, but with the slab provided by the interface is the same.

Two Slob is a lite version of the slab, increasing the fragmentation probability of memory allocation, essentially reducing efficiency, but requires less resource overhead (memory and CPU), so most of the slob is used in embedded systems, but the current embedded system computing power is generally strong, So slob basically out of the historical stage.

Memory Policies: Policy

Because of the advent of NUMA, it is necessary to allow the user program to control where its memory is applied. Mpol_default, mpol_preferred, Mpol_interleave, and mpol_bind, memory policies inherit from the parent process.

The memory used can also be moved, which is the migrate function of memory.


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Introduction to Linux Kernel Engineering--memory management (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.