4996510
1. Explanation of the principle
The Linux kernel incorporates a memory paging model for both 32-bit and 64-bit systems, and for 32-bit systems, the two-level page table is sufficient, and in the x86_64 system, a four-page table is used, and 2-1 is shown. The four-level page table is:
L-page Global catalog (page globally directory)
L Page Parent directory (page Upper directory)
L-Page Intermediate directory (page Middle directory)
L Page tables (page table)
The page global catalog contains the addresses of several pages of the ancestor directory, which in turn contains the addresses of several pages in the intermediate directory, and the middle of the page contains the addresses of several page tables, each of which points to a single page box. Linux uses a 4KB-size page box as the standard memory allocation unit.
1.1. Partner system Algorithms
In practical applications, it is often necessary to allocate a contiguous set of page boxes, and the frequent application and release of successive page boxes of different sizes will inevitably result in many small pieces of free page boxes scattered in the memory block of the allocated page box. This way, even if these page boxes are idle, other applications that need to allocate contiguous page boxes are difficult to meet.
To avoid this situation, the Linux kernel introduces the partner system algorithm (Buddy systems). Group all the Free page boxes into 11 block lists, each containing a page box block of size 1,2,4,8,16,32,64,128,256,512 and 1024 contiguous page boxes. You can apply up to 1024 consecutive page boxes, corresponding to 4MB of contiguous memory size. The physical address of the first page box for each page box block is an integer multiple of the block size.
Suppose you want to apply for a block of 256 page box, first find the free block from the list of 256 page box, if not, go to the List of 512 page box, find the block that divides the page box block into 2 256 page boxes, one is assigned to the application, and the other one moves to the 256 page box list. If there are still no free blocks in the list of 512 page boxes, continue to find the linked list of the 1024 page boxes, or return an error if they still do not.
When the page box block is released, it actively merges two contiguous page box blocks into a larger page box block.
1.2. Slab Splitter
The slab allocator derives from the Solaris 2.4 allocation algorithm and works on top of the Physical memory page box allocator to manage caches of specific size objects for fast and efficient memory allocation.
The slab allocator establishes a separate buffer for each kernel object used. The Linux kernel already employs a partner system to manage physical memory page frames, so the slab allocator works directly on the partner system. Each buffer consists of multiple slab, each slab a contiguous set of physical memory page frames, divided into a fixed number of objects. Depending on the size of the object, a slab can be composed of up to 1024 page boxes by default. Due to other aspects such as alignment, the memory allocated to the object in slab may be larger than the actual size of the object requested by the user, which can result in a certain amount of memory wasted.
2. Common memory allocation function 2.1. __get_free_pages
unsigned long __get_free_pages (gfp_t gfp_mask, unsigned int order)
The __get_free_pages function is the most primitive way to allocate memory, get the original page box directly from the partner system, and return the value to the starting address of the first page box. __get_free_pages only encapsulates the alloc_pages function in implementation, and from the Code analysis, the Alloc_pages function assigns a continuous page frame block of length 1<<order. The maximum value of the order parameter is determined by the Max_order macro in the Include/linux/mmzone.h file, which is defined as 10 in the default 2.6.18 kernel version. That is, theoretically __get_free_pages functions can apply up to 1<<10 * 4KB or 4MB of contiguous physical memory at a time. In practice, however, the allocation failure is likely due to the absence of such a large number of contiguous free page boxes. In the test, order was assigned a success of 10 o'clock, and an order of 11 returned an error.
2.2. Kmem_cache_alloc
struct Kmem_cache *kmem_cache_create (const char *name, size_t size,
size_t align, unsigned long flags,
void (*ctor) (void*, struct kmem_cache *, unsigned long),
void (*dtor) (void*, struct kmem_cache *, unsigned long))
void *kmem_cache_alloc (struct kmem_cache *c, gfp_t flags)
Kmem_cache_create/kmem_cache_alloc is a memory allocation method based on the slab allocator, which is suitable for repeatedly allocating memory blocks of the same size. First create a cache area with Kmem_cache_create, and then use Kmem_cache_alloc to get a new block of memory from the cache area. Kmem_cache_alloc The maximum memory that can be allocated at a time is defined by the Max_obj_order macro in the Mm/slab.c file, which is defined as 5 in the default 2.6.18 kernel version, so you can request a maximum of 1<<5 at a time * 4KB is the continuous physical memory of 128KB. Analysis Kernel source discovery, the Kmem_cache_create function's size parameter is greater than 128KB will call the bug (). The test results validate the analysis and cause the kernel to crash when allocating more than 128KB of memory with Kmem_cache_create.
2.3. Kmalloc
void *kmalloc (size_t size, gfp_t flags)
Kmalloc is the most commonly used memory allocation method in the kernel, which is implemented by invoking the Kmem_cache_alloc function. Kmalloc the maximum amount of memory that can be requested at a time by include/linux/kmalloc_ The content of the size.h determines that, in the default 2.6.18 kernel version, Kmalloc can request a maximum of contiguous physical memory of size 131702B or 128KB bytes at a time. The test results show that if you attempt to allocate more than 128KB of memory with the Kmalloc function, the compilation cannot pass.
2.4. Vmalloc
void *vmalloc (unsigned long size)
The preceding methods of memory allocation are physically contiguous, guaranteeing a lower average access time. However, in some cases, the request of the internal storage area is not very frequent, the higher memory access time is acceptable, this is the allocation of a linear continuous, physical discontinuous address, the benefit is that a large block of memory can be allocated at a time. Figure 3-1 represents the range of addresses used by Vmalloc allocated memory. Vmalloc has no explicit limit on the size of memory allocated at a time. For performance reasons, the Vmalloc function should be used with caution. During testing, you can allocate up to 1GB of space at a time.
2.5. Dma_alloc_coherent
void *dma_alloc_coherent (struct device *dev, size_t size,
ma_addr_t *dma_handle, gfp_t GFP)
DMA is a hardware mechanism that allows the direct transfer of IO data between peripherals and main memory without the need for CPU involvement, and the DMA mechanism can significantly increase the throughput of communication with the device. DMA operation, involving the CPU cache and corresponding memory data consistency problem, must ensure that the data consistency, in the x86_64 architecture, the hardware has been a good solution to this problem, dma_alloc_coherent and __get_free_ The pages function does not make much difference, the former actually calls the __alloc_pages function to allocate memory, so the size limit for one allocation of memory is the same as the latter. The memory allocated by __get_free_pages can also be used for DMA operations. The test results show that the maximum memory of the Dma_alloc_coherent function can be allocated at 4M.
2.6. Ioremap
void * IOREMAP (unsigned long offset, unsigned long size)
Ioremap is a more direct memory "allocation" method that directly specifies the physical start address and the size of the memory to be allocated, and then maps the physical address of that segment to the kernel address space. The physical address space used by Ioremap is predetermined, and the above several memory allocations are not quite the same, not allocating a new piece of physical memory. Ioremap is used for device drivers, allowing the CPU to directly access the IO space of external devices. The memory that the Ioremap can map is determined by the original physical memory space, so it is not tested.
2.7. Boot Memory
If you want to allocate a large number of contiguous physical memory, the above allocation function can not be satisfied, only in a more special way, in the Linux kernel boot phase to reserve some memory.
2.7.1. allocating memory at kernel boot time
void* Alloc_bootmem (unsigned long size)
You can bypass the partner system to allocate large chunks of memory during the Linux kernel boot process. The use method is to request the specified size of memory with the ALLOC_BOOTMEM function before calling the Mem_init function while booting the Linux kernel. If you need to call this memory somewhere else, you can export the first address of the memory returned by ALLOC_BOOTMEM through Export_symbol, and then you can use the memory. The disadvantage of this method of memory allocation is that the code that requests the memory must be used in the code that is linked to the kernel, so the kernel must be recompiled, and the memory management system does not see this part of the memory and needs to be managed by the user itself. The test results show that the kernel is rebooted after recompiling and can access the memory blocks allocated at boot time.
2.7.2. Reserving top memory with kernel boot parameters
When the Linux kernel boots, the incoming parameter "Mem=size" retains the memory interval at the top. For example, the system has 256MB of memory, the parameter "mem=248m" will reserve the top 8MB memory, enter the system can call Ioremap (0xf800000,0x800000) to request this memory.
3. Comparison of several allocation functions
|
Allocation principle |
Maximum Memory |
Other |
__get_free_pages |
Work directly on a page box |
4MB |
Suitable for allocating a larger amount of contiguous physical memory |
Kmem_cache_alloc |
Implementation based on slab mechanism |
128KB |
Suitable for use when frequent requests for free memory blocks of the same size are required |
Kmalloc |
Based on KMEM_CACHE_ALLOC implementation |
128KB |
The most common allocation method that needs to be smaller than the size of the page box can be used |
Vmalloc |
Establishing a mapping of non-contiguous physical memory to a virtual address |
|
Physical discontinuity, suitable for situations where large memory is required, but no address continuity is required |
Dma_alloc_coherent |
Based on __alloc_pages implementation |
4MB |
Suitable for DMA operation |
Ioremap |
To implement a known physical address-to-virtual address mapping |
|
For applications where physical addresses are known, such as device drivers |
Alloc_bootmem |
When you start kernel, a memory is reserved and the kernel is invisible |
|
Less than physical memory size, high memory management requirements |
Note: The maximum memory data mentioned in the table is from the CentOS5.3 x86_64 system, and other systems and architectures will have different
Common memory allocation functions in the Linux kernel ZZ