Common memory allocation functions in Linux kernel __oracle

Source: Internet
Author: User
1. Principle Explanation
The Linux kernel uses a memory paging model that applies to both 32-bit and 64-bit systems, and two-level page tables are sufficient for 32-bit systems, and four-level page tables are used in the x86_64 system, as shown in Figure 2-1. The Level four page table is:

L Page Global Catalog (page Global directory)

L Page Superior directory (page Upper directory)

L Page Middle directory (page Middle directory)

L Page tables (page table)
The page global catalog contains the addresses of several pages of parent directories, which in turn contain the addresses of several pages of intermediate directories, while the middle directory contains the addresses of several page tables, each of which points to a page box. A 4KB-size page frame is used in Linux as a standard memory allocation unit.
Multilevel Paging directory structure

1.1. Partner system Algorithms
In practical applications, it is often necessary to allocate a continuous set of page boxes, and frequently apply and release different sizes of continuous page boxes, which will inevitably result in the allocated page box in the memory block scattered many small blocks of free page boxes. In this way, even if the page frames are idle, other applications that need to assign a continuous page frame will be difficult to satisfy.
To avoid this, the partner system algorithm (Buddy Systems) is introduced into the Linux kernel. Group all the Free page boxes into 11 block lists, each containing a page box block with a size of 1,2,4,8,16,32,64,128,256,512 and 1024 consecutive page frames. Max can apply for 1024 consecutive page boxes, corresponding to 4MB of contiguous memory size. The physical address of the first page box of each page box block is an integer multiple of the block size.
Suppose you want to apply for a block of 256 page boxes, look for free blocks from the list of 256 page boxes. If not, go to the list of 512 page boxes, find the page box block divided into 2 256 page box block, one assigned to the application, the other moved to the 256 page box in the list. If there is still no free block in the list of 512 page boxes, continue to the list of 1024 page boxes and, if they are still not, return an error.
When a page box block is released, it proactively merges two consecutive page box blocks into a larger page block.

1.2. Slab Distributor
The slab allocator originates from the Solaris 2.4 allocation algorithm, working on top of the Physical memory page box allocator, managing the caching of objects of a particular size, and making fast and efficient memory allocations.
The slab allocator creates a separate buffer for each kernel object that is used. The Linux kernel already employs the partner system to manage the physical memory page frame, so the slab allocator works directly on the partner system. Each buffer consists of multiple slab, each of which is a contiguous set of physical memory page frames, divided into a fixed number of objects. Depending on the size of the object, a slab can consist of up to 1024 page boxes by default. For other purposes, such as justification, the memory allocated to an object in slab may be larger than the actual size of the object required by the user, which can result in a certain amount of memory waste.

2. Common memory allocation function

2.1. __get_free_pages

unsigned long __get_free_pages (gfp_t gfp_mask, unsigned int order) __get_free_pages function is the most original way of allocating memory, get the original page box directly from the partner system, The return value is the starting address of the first page box. __get_free_pages only encapsulates the alloc_pages function on implementation, from code analysis, the Alloc_pages function allocates contiguous page blocks of length 1<<order. The maximum value of the order parameter is determined by the Max_order macro in the Include/linux/mmzone.h file, which is defined as 10 in the default 2.6.18 kernel version. That is, in theory the __get_free_pages function can apply up to 1<<10 * 4KB or 4MB of contiguous physical memory at one time. However, in practical applications, it is likely that the allocation fails because of the absence of such a large number of contiguous free page boxes. In the test, the order 10 o'clock was assigned successfully, and the order 11 returned an error.


2.2. Kmem_cache_alloc
struct Kmem_cache *kmem_cache_create (const char *name, size_t size, size_t align, unsigned long flags,
void (*ctor) (void*, struct kmem_cache *, unsigned long),
void (*dtor) (void*, struct kmem_cache *, unsigned long))
void *kmem_cache_alloc (struct kmem_cache *c, gfp_t flags)
Kmem_cache_create/kmem_cache_alloc is a kind of memory allocation method based on slab allocator, which is suitable for repeatedly allocating the same size memory block (can be less than the page size). First create a cache area with Kmem_cache_create, and then use Kmem_cache_alloc to get the new memory block from the cache area. Kmem_cache_alloc the maximum amount of memory that can be allocated at one time is defined by the Max_obj_order macro in the Mm/slab.c file, which is defined as 5 in the default 2.6.18 kernel version, and can be applied at most once for 1<<5 * 4KB is the continuous physical memory of 128KB. Analysis of the kernel source found that the Kmem_cache_create function when the size parameter is greater than 128KB will invoke the bug (). The test results validate the analysis and the kernel crashes with kmem_cache_create allocating more than 128KB of memory.
2.3. Mempool_alloc
void *mempool_alloc (mempool_t *pool,int gfp_mask)
To ensure that memory is allocated successfully in the event that the memory allocation does not allow failure, the kernel provides an abstraction called a memory pool ("Mempool"), which is actually some sort of fallback cache, and the bottom of the mempool typically uses slab. It is used in emergency situations. So use must note: Mempool will allocate some memory blocks, so that it is idle and not really used, so easy to consume a lot of memory. Also, do not use Mempool to handle failed allocations. You should avoid using Mempool in your driver code.
2.4. Kmalloc
void *kmalloc (size_t size, gfp_t flags)
Kmalloc is one of the most common memory allocation methods in the kernel, which is implemented by calling the Kmem_cache_alloc function. Kmalloc the maximum amount of memory that can be requested at a time is INCLUDE/LINUX/KMALLOC_ The size.h content determines that, in the default 2.6.18 kernel version, Kmalloc can request a maximum of contiguous physical memory size of 131702B or 128KB bytes at a time. The test results show that if you attempt to allocate memory greater than 128KB with the Kmalloc function, the compilation cannot pass.
2.5. Vmalloc
void *vmalloc (unsigned long size)
The previous several memory allocation methods are physically continuous and can guarantee a lower average access time. However, in some cases, the internal storage of the request is not very frequent, high memory access time is also acceptable, this is the allocation of a linear continuum, the physical discontinuity of the address, the benefit is that a large chunk of memory can be allocated at one time. Figure 3-1 shows the range of addresses that Vmalloc allocates for memory usage. Vmalloc does not explicitly limit the amount of memory that can be allocated at one time. For performance reasons, you should use the Vmalloc function with care. In the test process, the maximum can be allocated 1GB space.
2.6. Dma_alloc_coherent
void *dma_alloc_coherent (struct device *dev, size_t size,ma_addr_t *dma_handle, gfp_t GFP)
DMA is a hardware mechanism that allows direct transfer of IO data between peripherals and main memory without the need for CPU involvement, and the use of DMA mechanisms can significantly increase the throughput of communication with the device. In the DMA operation, the problem of CPU cache and corresponding memory data consistency must be ensured, and in the x86_64 architecture, the hardware has solved the problem well, dma_alloc_coherent and __get_free_ There is little difference between the pages function implementations, which actually call the __alloc_pages function to allocate memory, so the size limit for allocating memory at once is the same as the latter. The memory allocated by __get_free_pages can also be used for DMA operations. The test results show that the maximum memory that the Dma_alloc_coherent function can allocate at one time is 4M.
2.7. Ioremap
void * IOREMAP (unsigned long offset, unsigned long size)
Ioremap is a more direct way of allocating memory, specifying the physical starting address and the size of the memory that needs to be allocated, and then mapping the physical address of the segment to the kernel address space. The physical address space used by Ioremap is determined in advance, unlike several memory allocations above, not allocating a new amount of physical memory. Ioremap is used for device drivers, allowing the CPU to directly access the IO space of external devices. The memory that Ioremap can map is determined by the original physical memory space, so no tests are performed.
2.8. Boot Memory
If you want to allocate a large amount of contiguous physical memory, the above allocation function is not satisfied, you can only in a more specific way, in the Linux kernel boot phase to reserve some memory.
2.8.1. allocating memory at kernel boot time
void* Alloc_bootmem (unsigned long size)
You can bypass the partner system during the Linux kernel boot process to allocate large chunks of memory. The use method is to request memory of a specified size with the ALLOC_BOOTMEM function before the Mem_init function is invoked when the Linux kernel boots. If you need to call this memory somewhere else, you can export the first address of the memory returned by the Alloc_bootmem through Export_symbol, and then you can use this block of memory. The disadvantage of this method of memory allocation is that the code requesting the memory must be used in the code that is linked to the kernel, so the kernel must be recompiled, and the memory management system does not see this part of memory and needs to be managed by the user. The test results show that after recompiling the kernel and rebooting, the memory blocks allocated at boot time can be accessed.
2.8.2. Reserve top memory through kernel boot parameters

When the Linux kernel boots, the incoming parameter "Mem=size" retains the top memory interval. For example, the system has 256MB memory, parameter "mem=248m" will reserve the top 8MB memory, enter the system can call Ioremap (0xf800000,0x800000) to request this memory.

3. Comparison of several distribution functions



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.