Http://blog.sina.com.cn/s/blog_65373f1401019dtz.html
Linux Kernel Learning notes-5 memory management
1. Related Data structures
allocating memory in the kernel tends to be more restrictive than user space, such as the inability to sleep in many cases in the kernel, and the failure to handle memory allocations as easily as user space. The kernel uses both page and zone data structures to manage memory:
Page 1.1
The kernel takes physical pages as the basic unit of memory management. Although the smallest addressable unit of the CPU is usually a word (or even a byte), the MMU (Memory management unit, hardware that manages memory and translates virtual addresses into physical addresses) is typically processed in pages, and the MMU manages page tables in the system in pages. From the point of view of virtual memory, the page is the smallest unit.
Page sizes are also different under different architectures, typically 32-bit architectures support 4KB pages, and 64-bit architectures support 8KB of pages. The kernel uses a struct page structure to represent each physical page in the system:
struct page{
unsigned long flags; The status of the storage page, including whether the page is dirty, is not locked in memory and so on;
Flags can represent 32 independent states at the same time.
atomic_t _count; The reference count of the page, when _count=-1, indicates that the page is not referenced by the kernel,
The new assignment can use it. Kernel code does not normally check the domain directly,
Instead, the Page_count () function is called to check that the only parameter of the function is
struct page structure. Page_count () returns 0 for page idle,
Returns a positive integer representing the page being occupied.
atomic_t _mapcount;
unsigned long private;
struct Address_space *mapping; When a page is used by the page cache, the Maping field points to the page associated with the
The Address_space object
pgoff_t index;
struct List_head LRU;
void *virtual; The address of the page in virtual memory. High-end memory is not permanently mapped at the kernel address
Space, then the Viutual field is null.
};
It is emphasized that the struct page is related to the physical page, not to the virtual page. As a result, the structure's description of the page is short, and the data contained in a timely page still exists, and they may no longer be associated with the same struct page structure for exchange reasons. The kernel uses this data structure only to describe the current moment in the relevant physical pages of the information stored--that is, the purpose of the struct page is to describe the physical memory itself, rather than the data.
The kernel uses a struct page to manage all the pages in the system. If the system has 4GB of memory, each physical page 4KB, there will be 1M pages, each struct page in 40 bytes, you need 40MB space to hold all the physical pages of the struct page. The kernel needs to know if a page is free, and if a page is already assigned, the kernel also knows who owns the page, which may be user-space processes, dynamically allocated kernel data, static kernel code or page caches, and so on.
Zone 1.2
Due to hardware limitations, the kernel cannot treat all pages equally. Some pages are located in some special places in memory and therefore cannot be used for specific purposes. Because of this, the kernel divides all physical pages into several extents, and the physical pages in the same area have similar characteristics. The kernel must handle the following two memory addressing issues due to hardware defects:
I> Some hardware can only use certain memory addresses to perform DMA (direct memory access);
Ii> the physical addressing range of some architectures is much larger than the virtual addressing range, so some memory cannot be permanently mapped to the kernel space
Because of this, the kernel divides the physical pages into the following different zones (for example, x86-32 architecture):
Zone Description Physical Memory
Page used by ZONE_DMADMA <16MB
Zone_normal Normal addressable page 16~896MB
Zone_highmem so-called high-end memory, where the page cannot be permanently mapped to the kernel address space >896mb
Under different architectures, the distribution of zones is different. For example, some architectures can perform DMA on any address in memory, so on these architectures, ZONE_DMA is empty, zone_normal can be used directly for allocation, and on the contrary, on the x86 architecture, ISA device can only perform DMA on the first 16MB of physical memory. Zone_highmem is also related to architecture, where all memory in some architectures can be mapped directly, so zone_highmem is empty, whereas in 32-bit x86 systems, ZONE_HIGHMEM is all physical memory above 896MB.
Zoning does not have any physical meaning, but the logical division of the kernel to facilitate the management of pages. Some allocations may need to fetch pages from a specific zone, such as DMA operations only from ZONE_DMA, but memory for general purposes can be obtained from ZONE_DMA and Zone_normal, and the kernel should, of course, allow general-purpose memory to be obtained from Zone_normal. To save the rare ZONE_DMA.
2. Memory Management Interface
2.1 Memory management interface for pages
Interface function prototype description
struct page *alloc_pages
(gfp_t gfp_mask, unsigned int order), which assigns a continuous physical page of 2^order (typically order far greater than 1) and returns a pointer to the struct page struct of the first page, Returns NULL if an error occurs
struct page *alloc_page
(gfp_t gfp_mask); only one page is assigned, and the page structure pointer of the page is returned
unsigned long __get_free_pages
(gfp_t gfp_mask, unsigned int order); assigns 2^order contiguous physical pages, returning the logical address of the first page
unsigned long __get_free_page
(gfp_t gfp_mask); only one page is assigned, returning the logical address of the page
unsigned long get_zeroed_page
(gfp_t gfp_mask), same as __get_free_page (), except that each bit of each byte of the page is filled with 0. This is to prevent the user process from accidentally getting some sensitive information
void __free_pages
(struct page *page, unsigned int order), releasing a contiguous 2^order physical page starting from *page
void Free_pages
(Unsigned long addr,
unsigned int order); releases a contiguous 2^order of physical pages starting from addr
void Free_page
(unsigned long addr); release a page starting from addr
2.2 Kmalloc
The lower-level page allocation function is convenient when you need to allocate pages, but you can use Kmalloc () if you need to allocate in bytes. Kmalloc () can obtain a contiguous piece of physical memory in bytes.
Interface function prototype description
void * Kmalloc
(size_t size, gfp_t gfp_mask); The function returns a pointer to a memory block that has at least a size byte and is physically contiguous.
If no allocation succeeds, NULL is returned.
void *kfree
(const void *ptr); releases a block of memory allocated by Kmalloc ()
2.3 Vmalloc
The Vmalloc is similar to Kmalloc, but the memory allocated by Vmalloc is determined to be contiguous, the physical address is not necessarily contiguous, and kmalloc guarantees that both the physical address and the virtual address are contiguous.
Interface function prototype description
void * Vmalloc
(unsigned long size); The function returns a pointer to a memory block that has at least a size byte and is logically contiguous.
If no allocation succeeds, NULL is returned.
The function may sleep.
void *vfree
(const void *ptr); releases the block of memory allocated by Vmalloc ().
The function may sleep
In most cases, only the hardware device needs to get the physical address contiguous memory, they do not even understand the virtual address, and the software-only memory block can only be virtual address continuous, because the software only cares about the logical address.
But many parts of the kernel can be used Vmalloc but kmalloc, mainly because kmalloc faster. Vmalloc in order to convert the physically discontinuous memory into a contiguous page in the virtual address space, the page table entry needs to be set up specifically. So Vmalloc is just a last resort, typically when you need to get extra chunks of memory, such as when the module is dynamically plugged into the kernel, it loads the module into the memory allocated by the Vmalloc.
2.4 Gfp_mask Flag
Gfp_mask is involved in low-level page allocation functions and in Kmalloc. Gfp_mask is a collection of flags that can be divided into two main categories: the behavior modifier and the zone modifier. The behavior modifier indicates how the kernel allocates the required memory. The zone modifier indicates where memory is allocated. For example, the following table:
Common behavior Modifiers
Logo description
__gfp_wait Dispenser can sleep
__gfp_io dispenser can start disk I/O
__gfp_fs allocator can start file system I/O
__gfp_high allocator can access emergency event buffer pool
Common area Modifiers
Logo description
__GFP_DMA only allocated from ZONE_DMA
__gfp_highmem assigned from Zone_highmem or Zone_normal
For ease of use, the behavior modifier and the area modifier are combined into different type flags, and usually you just need to use the type flags.
Common Type Flags
Flag Modifier Combination Description
Gfp_kernel (__gfp_wait |
__gfp_io |
__GFP_FS) This flag can be used in process context codes that have the potential to sleep. This allocation may be blocked, or disk I/O may be started. This flag does not have any constraints on how the kernel requests memory, can sleep, swap, refresh some pages to the hard disk, and so on, thus allocating a higher probability of success.
Gfp_atomic__gfp_high This sign is used for ISR, lower half, holding spin lock and other places where you can't sleep. Callers need to meet more restrictions, so the allocation success rate is not gfp_kernel high, especially when memory is scarce.
3. Slab Dispenser
See Linux kernel Learning Note-2 slab splitter
4. Allocations on Stacks
4.1 kernel stacks, user stacks, and interrupt stacks
The kernel creates a process by creating the appropriate stack for the process while creating the task_struct. Each process will have two stacks: a user stack that exists in the user space, a kernel stack, and exists in the kernel space. When the process runs in user space, the contents of the CPU stack pointer register are the user stack address, the user stack is used, and when the process is in kernel space, the contents of the CPU stack pointer register are the kernel stack space address, using the kernel stack. Depending on the architecture, the kernel stack can be 1 pages or 2 pages in a row, in size from 4KB to 16KB. In addition to the user stack and the kernel stack, the 2.6 also implements a new interrupt stack, which exists in the case of a kernel stack of 1 pages. The interrupt stack provides a stack for each process to interrupt the handler, and after that, the interrupt handler does not need to share the kernel stack with the interrupted process.
4.2 Process user stack and kernel stack switchover
When a process is stuck in a kernel state because of an outage or a system call, the stack used by the process goes from the user stack to the kernel stack. After the
process is in the kernel state, the address of the user-state stack is stored in the kernel stack, and then the CPU stack pointer register is set to the address of the kernel stack, which completes the conversion of the user stack to the kernel stack, and when the process recovers from the kernel state to the user state, At the end of the kernel-state line, the address of the user stack stored in the kernel stack is restored to the CPU stack pointer register. This enables the core stack and the user stack of the mutual transfer.
Well, we know that the address of the user stack when it goes from the kernel to the user state is stored in the kernel stack when it is trapped in the kernel, but how do we know the address of the kernel stack when we get into the kernel? The
key is that the kernel stack of the process is always empty when the process goes from the user state to the kernel state. This is because, when the process is running in the user state, the user stack is used, when the process falls into the kernel state, the kernel stack holds the information about the kernel state running, but once the process returns to the user state, the information stored in the kernel stack is invalid and will be restored. So every time the process gets into the kernel from the user state, the kernel stack is empty. So when the process is in the kernel, the stack top address of the kernel stack is given directly to the stacking pointer register.
5. High-end memory mapping
High-end memory is the physical page in Zone_highmem. Pages in high-end memory cannot be obtained through __get_free_pages () or Kmalloc (), because the return values of both functions are logical addresses, and physical pages in high-end memory cannot be permanently mapped to the kernel address space, and therefore may not have a logical address at all. You can get a page in high-end memory by Alloc_pages () or alloc_page () with the __GFP_HIGHMEM flag, and the return value of the two functions is the struct page pointer.
To map a given page to the kernel address space, you can use the following function:
void *kmap (struct page *page);
This function can be used in high-end memory and low-end memory. If the page structure corresponds to pages in low-end memory, the function simply returns its logical address, and if it corresponds to a page in high-end memory, a permanent mapping is established and the address is returned. This function may sleep.
Because the number of permanent mappings allowed is limited (otherwise there is no need to be so troublesome to map all memory directly), when high-end memory is no longer needed, it should be de-mapped by:
void Kumap (struct page *page);
For more content on high-end memory, you can refer to Linux user space and kernel space data transfer, this article is very good
6. PER-CPU Memory Management
See Linux kernel Learning Note-3 kernel sync mechanism, 4.2.
Add the benefits and considerations for using PER-CPU. The advantage is that data locking is reduced first, because each CPU only accesses its own PER-CPU data, and secondly, the use of PER-CPU data increases the cache hit ratio. Note, however, that the PER-CPU needs to disallow kernel preemption, but the interface will automatically complete this step, and it won't be able to sleep when accessing PER-CPU data, or you might have reached another processor when you woke up.
7. Comparison and summary of different allocation functions
If you need a continuous physical page, you can use a low-level page allocation function or Kmalloc (), which is a common memory allocation method in the kernel. The most common flags passed to these functions are gfp_atomic and Gfp_kernel, which represent memory allocations that prohibit sleep and can be used for ISR or other places where sleep is not possible.
If you want to allocate from high-end memory, use Alloc_pages () or alloc_page (). Instead of a logical address, they return a pointer to a struct page structure. Because high-end memory is probably not mapped, it can only be accessed through a struct page. To get a true logical address, you need to call Kmap () to map high-end memory to the logical address space of the kernel.
If you do not need a physically contiguous page, you can use Vmalloc () only if you need a contiguous page on a logical address, but you will incur some performance penalty.
If you want to create and revoke many of the same types of data structures, you can establish a slab cache. The slab allocator maintains an object cache (idle list) for each CPU, which can greatly improve the performance of object allocation and recycling. The slab layer does not allocate and release memory frequently, but instead allocates pre-allocated objects to the cache, and when you need a new piece of memory to hold the data structure, the slab layer generally does not need to reallocate the memory, just need to take an object out of the cache.
8. References
Linux kernel development, 3rd edtion, Robert Love
Process kernel stack, user stack
Linux user space and kernel space data transfer
FAQ-linux Kernel Space
Linux Kernel Learning notes-5 memory management (RPM)