Linux kernel--memory management

Last Update:2017-04-27 Source: Internet

Author: User

Tags modifier

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memory ManagementPage

The kernel takes physical pages as the basic unit of memory management. The Memory Management unit (MMU, which manages memory and translates virtual addresses into physical addresses) is typically processed in pages. The MMU manages page tables in the system in page size.

From the point of view of virtual memory, the page is the smallest unit.

32-bit system: Page size 4KB

64-bit system: Page size 8KB

On a machine that supports 4KB page size and has 1GB of physical memory. The physical memory is divided into 262,144 pages.

The kernel uses a struct page structure to represent every physical page in the system.

struct Page {

page_flags_t flags; /* indicates the status of the page. Each one represents a state * /

atomic_t _count; /* The reference count of the page is stored,0 is not referenced * /

atomic_t _mapcount;

unsigned long private;

Strcut Address_space *mapping;

pgoff_t index;

struct List_head LRU;

void *virtual; /* page in virtual memory address, Dynamic mapping physical page * /

}

Below, let's explain the important fields below.

Flags: This field is used to store the state of the page. These states contain pages that are not dirty and are not locked in memory medium. Each of the flags represents a state individually. So, it can at least represent 32 different states at the same time.

_count: This field holds the page's usage count, which is how many times the page has been referenced. Very strange. When the technical value becomes-1, it means that the current kernel does not reference this page. So, you can use it in a new assignment, and note that this field uses 1 for unused, not 0.

Vsan: This field is the virtual address of the page.

Mapping: This field points to the Address_space object associated with this page.

Private: This can be seen by name, which points to private data.

The kernel manages all the pages in the system through this data structure. Because the kernel needs to know whether a page is spare, who owns the page. The owner may be: User space process, dynamically allocated kernel data, static kernel code, page cache, and so on. Each physical page in the system is allocated such a structure for memory management.

Area

Because of hardware limitations, the kernel does not treat all pages equally. Linux must handle memory addressing issues such as the following two due to hardware deficiencies:

1 Some hardware can only run with some specific memory address DMA (Direct memory interview).

2 some architectures have a much larger physical addressing range than the virtual addressing range of their memory.

In this way, there is some memory that cannot be permanently mapped to the kernel space.

Because of this limitation, the kernel divides pages with similar characteristics into different zones (zone):

1) zone_dma--This area includes pages that can be used to run DMA operations.

2) zone_normal--This area includes the ability to map pages normally.

3) zone_dma32--, only can be interviewed by 32 devices

4) zone_highmem--This area includes "high-end Memory", in which the pages can not be permanently mapped to the kernel address space.

Linux divides the system's pages into zones, forming different pools of memory so that they can be allocated based on usage.

Attention. The division of the district has no physical significance whatsoever. This is just a logical grouping of the kernel for managing pages. The memory used for DMA must be allocated from the ZONE_DMA. But general-purpose memory can be allocated from ZONE_DMA, but also from Zone_normal.

Get page

The kernel provides an underlying mechanism for requesting memory and provides several interfaces to access it. All of these interfaces allocate memory in page units. Defined in <linux/gfp.h>.

The core functions are:

Structpage *alloc_pages (unsigned int gfp_mask, unsigned int order);

The function assigns 2order contiguous physical pages and returns a pointer to the page structure of the first sheet, assuming that an error returns null.

Void*page_address (struct page *page);

Converts the given page to its logical address. Suppose you don't need to use a struct page. Ability to invoke:

Unsignedlong __get_free_pages (unsigned int gfp_mask, unsigned int order);

This function is the same as alloc_pages, except that it returns the logical address of the requested first page directly. Because the page is contiguous, the other pages are followed.

Suppose you only need a page, you can use the following two functions:

Structpage *alloc_page (unsigned int gfp_mask);

Unsignedlong _get_free_page (unsigned int gfp_mask);

Assuming that the content of the return page is all 0, the following function can be used

Unsignedlong get_zeroed_page (unsigned int gfp_mask);

Method	Descriptive narrative
Alloc_page (Gfp_mask)	Simply assign a page and return a pointer to the page structure
Alloc_pages (Gfp_mask, order)	Assign 2^order pages, return a pointer to the first page structure
__get_free_page (Gfp_mask)	Simply assign a page and return a pointer to its logical address
__get_free_pages (Gfp_mask, order)	Assign 2^order pages, return a pointer to the logical address on the first page
Get_zeroed_page (Gfp_mask)	Just assign a page to fill it with 0 and return a pointer to its logical address

You can use the following function to release a page when it is no longer needed.

Void__free_pages (struct page *page, unsigned int order);

Voidfree_pages (unsigned long addr, unsigned int order);

Voidfree_page (unsigned long addr);

Be cautious when releasing pages, just release the page that belongs to you. The wrong struct page or address was passed, and the wrong order value could cause the system to crash. Keep in mind that the kernel is totally dependent on itself.

Kmalloc ()

Kmalloc is similar to the malloc family function, except that it has one more flags parameter. Kmalloc stated in <linux/slab.h>:

Void*kmalloc (size_t size, int flags);

This function returns a pointer to a memory block with a memory block at least size. The allocated memory is physically contiguous .

When an error occurs, it returns NULL. Unless there is not enough memory available. Otherwise the kernel always allocates success.

After the call to Kmalloc, you must check that the return is not NULL, assuming it is appropriate to handle the error.

In a low-level page allocation function or Kmalloc, the Gfp_mask (allocator flag) is used. These flags can be divided into three categories: the behavior modifier, the area modifier, and the type.

1) The behavior modifier indicates how the kernel should allocate the required memory. In some specific cases, only some specific methods can be used to allocate memory.

Like what. The interrupt handler requires the kernel to not sleep while allocating memory (because the interrupt handler cannot be dispatched again).

2) The zone modifier indicates exactly which area to allocate from.

3) The type flags combine the behavior modifier and the area modifier. The various combinations that may be used are summed up into different types. Simplifies the use of modifiers.

Kmalloc and one end of the Kfree,kfree statement in <linux/slab.h>

voidkfree ( constvoid *ptr);

The Kfree function frees the memory blocks allocated by the Kmalloc.

It is safe to call Kfree (NULL).

Vmalloc ()

Vmalloc work in a manner similar to Kmalloc . Only the memory virtual address assigned by the former is contiguous, and the physical address does not need to be contiguous. This is also how user-space-allocation functions work: pages returned by malloc () are contiguous within the virtual address space of the process, but this does not guarantee that they are contiguous in the physical RAM. The Kmalloc () function ensures that the page is contiguous on the physical address. The value of the Vmalloc function is guaranteed to be contiguous within the virtual address space. It does this by allocating non-contiguous blocks of physical memory and mapping the memory to contiguous areas of the logical address space in the Revision page table.

In most cases, only the hardware device needs to get the physical address contiguous memory, because the hardware device exists outside the memory management unit, it does not understand what is the virtual address. Although there are only a few cases where a physically contiguous block of memory is required, very many cores have kmalloc () to get the memory. Rather than Vmalloc (). This is mainly due to performance considerations. The Vmalloc () function in order to convert a physically discontinuous page into a contiguous page on a virtual address space. Page table entries must be created specifically. Unfortunately, the pages obtained through VMALLOC () must be mapped one by one. For these reasons, it is usually in order to obtain large chunks of memory. For example, when the module is dynamically inserted into the kernel. The module is loaded into the memory allocated by Vmalloc ().

void *vmalloc (unsigned long size)

The function returns a pointer. Points to a logically contiguous piece of memory. It is at least size. When an error occurs. The function returns NULL.

The function may sleep, so it cannot be called from the context of the interrupt. It is also not possible to make calls from other blocks of different meanings.

Release via Vfree () function

void Vfree (const void *ADDR)

Slab Layer

To facilitate the frequent allocation and recycling of data, the Linux kernel provides the slab layer (the so-called slab allocator). The slab allocator plays the role of the common data structure caching layer.

The slab layer divides different objects into fast caches. Each of these fast cache groups holds different types of data structure objects. For example, a fast cache is used to hold process descriptive descriptors, and a quick cache is used to hold the I node.

These fast caches are also classified as slab. Slab consists of one or more physically contiguous pages. Under normal circumstances, slab is made up of only one page. Each fast cache can consist of multiple slab.

Each slab includes a number of object members. The object here refers to the data structure that is being cached. Each slab is in one of three states: full, partially full, or empty. When a certain part of the kernel needs an object. will be assigned by the slab, the first consideration is partially full of slab. Assuming that there is no partial full slab, the slab allocation is empty, assuming there is no empty slab. The kernel needs to request the page to allocate the fast cache again. The description describes the relationship between fast caches, slab, and objects. From http://www.cnblogs.com/wang_yb/archive/2013/05/23/3095907.html

Whole Slab the principles of the layer are as follows:

1. ability to create fast caches of various objects in memory ( for example, process description Description related structure task_struct Fast Cache )

2. Fast caching of common objects in addition to fast caches for specific objects

3. include multiple slab in each fast cache . Slab objects for managing the cache

4.slab contains multiple cached objects, physically consisting of one page or more contiguous pages

Each fast cache is represented by a kmem_cache_s structure. This structure consists of three linked list slabs_full. Slabs_partial and Slabs_empty. are stored inside the kmem_lists structure. These lists include all slab in the fast cache. Slab descriptive narrative structslab is used to describe each slab:

struct Slab {

struct List_head list; /* Full, partial full or empty list */

unsigned long colouroff; /* Slab shading Offset */

void *s_mem; /* The first object in the slab */

unsigned int inuse; /* Number of allocated objects */

kmem_bufctl_t Tree; /* First Spatial object (if any) */

};

interface of the Slab distributor

There are four main

Creation of a fast cache

struct kmem_cache * kmem_cache_create (constcharlongvoid (*ctor) (  void *))

2. Assigning objects from the quick cache

void *kmem_cache_alloc (struct kmem_cache *cachep, gfp_t flags)

3. Release the object. Back to the original slab

void kmem_cache_free (structvoid *OBJP)

4. Fast Cache Destruction

void Kmem_cache_destroy (struct kmem_cache *cachep)

SlabResolving Memory Fragmentation

There are two ways in which memory fragmentation exists: A. Internal Fragmentation B. external Fragments

generation of internal fragments: Since all memory allocations must start at the point where they can be 4 , 8 or - divisible (depending on processor architecture) addresses or due to the paging mechanism of the MMU, determines that the memory allocation algorithm can only allocate a predetermined size of memory blocks to the customer. If a customer requests a 43-byte block of memory, because there is no suitable size for memory, it may get 44 bytes, 48 bytes, and so on a slightly larger byte, so the extra space generated by rounding the desired size is called internal fragmentation.

generation of external fragments: frequent allocation and recycling of physical pages results in a large number of contiguous and small page blocks interspersed between allocated pages. will produce external fragments.

Suppose there is a contiguous spare memory space that has a common 100 units. The range is 0~99. Suppose you apply for a piece of memory, such as 10 units, then the memory block requested is the 0~9 interval.

At this point you continue to apply for a piece of memory. Let's say 5 units are large. The second block of memory should be the 10~14 interval. Let's say you release the first block of memory. Then apply for a block of memory larger than 10 units. Let's say 20 units. Since the memory block just freed does not satisfy the new request, it is only possible to allocate 20 units of memory block from the beginning of 15. The state of the entire memory space is now 0~9 spare. 10~14 is occupied. 15~24 is occupied, 25~99 spare. The 0~9 is a memory fragment.

Assuming that 10~14 has been occupied, and later the application space is greater than 10 units, then 0~9 will never be used. into an external fragment.

Workaround:

Slab mechanism, because the slab pre-allocates the memory of a specific data structure size, so there is no internal fragmentation or external fragmentation.

Slabcompared to traditional memory management modes:

Compared to the traditional memory management model. The slab cache allocator provides a great many advantages.

First of all. The kernel usually relies on the allocation of small objects, which are allocated countless times during the system life cycle. The slab cache allocator provides such functionality by caching objects of similar size. This avoids common fragmentation problems. The slab allocator also supports initialization of common objects, thus avoiding the repeated initialization of an object for the same purpose.

At last. The slab allocator can also support hardware cache alignment and shading, which prevents false sharing (two or two objects, although located in different memory addresses, but mapped to the same tell buffer line), which improves performance. But at the expense of adding memory waste.

Static allocation on the stack

The kernel stack size is fixed. We should pay attention to save the stack resources in the process, to control the local variables inside the function. Try not to appear large arrays or large structures.

Especially with the kernel stack, the kernel data (such as Thread_info) can be affected once the overflow is caused. Therefore, dynamic allocation should be given priority. The other process's kernel stack and interrupt stack are separate, which reduces the burden on the kernel stack (a kernel stack accounts for just 1 pages or 2 pages).

high-end memory mapping

Since the 32-bit processor can address up to 4GB. Once these pages are assigned. Must be mapped to the virtual memory space of the kernel.

The full range of physical memory above 896MB is mostly high-end memory, which does not permanently or proactively map to the kernel virtual address space.

The virtual memory size of the kernel address is 1G. The 0-896m memory is mapped to physical memory one by one, which is linear mapping. The 896MB~1024MB virtual memory hypothesis is also linearly mapped to physical memory. Then the kernel state can only use 1G of physical memory. Even if the physical memory is greater than 1G (4G, for example), the physical memory is not fully utilized. Therefore, 896MB~1024MB in kernel virtual memory does not map to high-end memory one by one. Detailed mapping methods such as the following:

When the kernel state needs to access high-end physical memory. 896-1024MB in the kernel virtual memory space find a logical address space corresponding to the size spare. Borrow it for a while. Using this logical address space, create a map to the physical memory that you want to access, temporarily for a while, and then return it when you're done.

This is when you need to access other high-end physical memory behind the process. Can still use this logical address space.

The most basic idea of high-end memory : In the kernel virtual space 896mb~1024mb in memory to borrow an address space, establish a temporary address mapping with high-end physical memory, after the use of free virtual space. This virtual address space can be reused to access all physical memory.

There are three ways of high-end memory mapping:

1. Map to "kernel Dynamic mapping Space"

Such a way is very easy. With Vmalloc (), it is possible to get a page from high-end memory (see the implementation of Vmalloc) when the "kernel Dynamic mapping Space" requests memory, so it is possible that high-end memory is mapped to "kernel dynamic mapping Space".
2. Permanent Kernel mapping
Suppose you get a page of high-end memory by Alloc_page (), how do you find a linear space for it?
The kernel leaves a linear space for this purpose. From Pkmap_base to Fixaddr_start for mapping high-end memory. On the 2.4 kernel, this address range is 4g-8m to 4g-4m.

This space is called the "kernel permanent mapping Space" or "permanent kernel mapping Space".

This space and other space use the same page folder table, for the kernel, is swapper_pg_dir, for ordinary processes. Point through the CR3 register.

Typically, this space is 4M in size, so only a single page table is required, and the kernel is looking for the page table by pkmap_page_table.
3. Temporary mapping

When a mapping must be created and the current context cannot sleep. The kernel provides a temporary mapping (that is, atomic mapping). There is a set of reserved mappings. They can store the newly created temporary mappings. The kernel can atomically map a page in a high-end memory to a reserved map. As a result, temporary mapping can be used in places where you can't sleep, such as in an interrupt handler, because it is never blocked by getting a map.

Each CPU data

An SMP environment with too many locks can seriously affect the efficiency of parallelism, assuming it is a spin lock. It also wastes the running time of other CPUs.

So the kernel has an interface that allocates data by CPU. After the data is allocated by CPU. Each CPU's own data will not be interviewed by other CPUs. Although a bit of memory is wasted, it makes the system more concise and efficient.

There are 2 main advantages to allocating data by CPU:

1. The most direct effect is to reduce the data lock, improve the performance of the system

2. Because each CPU has its own data, the processor switch can greatly reduce the probability of cache failure. Because a processor is assumed to operate a certain data. While this data is in the cache of a processor, the processor that holds the data must clean up or refresh its cache. Persistent cache invalidation becomes cache jitter. Have a very large impact on system performance.

Linux kernel--memory management

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More