Linux Kernel memory allocation types and Methods

Source: Internet
Author: User

Function Description:
Struct pageAlloc_pages (unsigned int flags, unsigned int order); the function allocates two * consecutive physical pages * to the power of order (1 <order) in gfp_mask allocation mode *.

The assigned page can be accessed through void
Page_address (struct pagePage) function to obtain the logical address pointer corresponding to the page. If you do not need to use struct page, you can use _ get_free_pages (unsigned int flags, unsigned int order) to allocate and return a pointer pointing to the first byte in a memory area, the memory zone is several (* physically consecutive) The page is long but * not cleared *. You can also use get_zeroed_page (unsigned int flags); return a pointer to the new page and fill the page with * zero *.

Input parameter description:
Gfp_mask: indicates the allocation flag. The kernel allocates memory in multiple ways. This parameter tells the kernel how to allocate and where to allocate the required memory _Get_free_pages (), which is also implemented by the green codeThe origin of the prefix. The allocation flag (gfp_mask) can take the following values:

Gfp_kernel this allocation method is the most common, is the normal distribution of kernel memory, it may sleep.
Gfp_atomic this allocation method is often used to allocate memory from code other than interrupt processing and process context, never sleep.
Gfp_user is used to allocate memory pages for user space, which may cause sleep.
Gfp_highuser is similar to gfp_user. If there is high-end memory, it will be allocated from the high-end Memory Page.
Gfp_noio
The gfp_nofs function is similar to gfp_kernel, but the memory allocation to the kernel is limited. Allocation with gfp_nofs is not allowed to execute any file system call, while gfp_noio prohibits any I/O initialization. They are mainly used in file systems and Virtual Memory code, where sleep can be allocated, but recursive file system calls should not occur.

Some labels use double-underline as the prefix. They can be used with the above sign "or" to control the allocation method:
_Gfp_dma requires the allocation of memory that can be used for DMA.
The memory allocated by _ gfp_highmem can be in high-end memory.
_Gfp_nowarn prevents the kernel from issuing a warning (using printk) when one allocation fails ).
_ Gfp_high high-priority requests that allow consumption of the last memory pages reserved by the kernel for emergencies.
_Gfp_repeat
_ Gfp_nofail
_Gfp_noretry tells the distributor how to act when it is difficult to satisfy a allocation._ Gfp_repeat indicates that the attempt is made again, but the attempt may still fail. __gfp_nofail tells the allocator to do its best to meet the requirements and never returns a failure. It is not recommended to use it; __gfp_noretry informs the allocator to return immediately if the request cannot be met.

Order: The number of physical pages to be allocated. The value is the power of order 2.
Response parameters:
The alloc_pages () function returns the page struct pointer, pointing to the first page in the allocated physical page. If the allocation fails, null is returned.

Void * kmalloc (size_t size, int flags), allocates physical continuous virtual memory, and finally calls the _ get_free_pages function to allocate memory, which is implemented based on slab. Slab cannot use DMA or highmem memory. Then, kmalloc only uses normal.
Input parameter description:
The first parameter for kmalloc is the size of the block to be allocated. the physical memory of the kernel management system, which is used only by page size blocks. the result is that kmalloc seems very different from a typical user space Implementation of malloc. A simple heap-oriented distribution technology may be very troublesome soon; it may be difficult to solve page boundaries. therefore, the kernel uses a special page-oriented distribution technology to make the best use of the System Ram. linux processes memory allocation by creating a fixed-size memory object pool. the allocation request is processed in this way, entering a pool of large enough objects and submitting the entire memory block to the requester. the memory management solution is very complex,
And not all device drivers are interested in the details. however, one thing drivers should remember is that the kernel can only allocate predefined, fixed-size byte arrays. if you request an arbitrary amount of memory, you may get a little more than your request, up to 2 times. similarly, programmers should remember that the minimum allocation that kmalloc can handle is 32 or 64 bytes, depending on the page size used by the system. there is an upper limit on the size of memory blocks that can be allocated by kmalloc. this restriction varies with system and Kernel configuration options. if your code is completely portable, it cannot be expected to allocate anything larger than 128 kb. if you need more than a few
KB, but there is a better way to get the memory than kmalloc, that is, the page allocation function discussed earlier.
The second parameter indicates the allocation flag, which is the same as the flags parameter of the alloc_pages function. It is very interesting, because it controls the behavior of kmalloc in several ways. The most common sign, gfp_kernel, means this allocation (internally eventually called through _Get_free_pages:The source of the prefix. in other words, this means that calling a function indicates that a process is executing a system call. using gfp_kenrl means that kmalloc can make the current process sleep for one page with less memory. use gfp_kernel
Therefore, it is reentrant and cannot be run in atomic context. when the current process is sleeping, the kernel takes the correct action to locate some idle memory, or refresh the cache to the disk or swap out the memory of a user process. gfp_kernel is not always the correct allocation flag used; sometimes kmalloc is called from outside the context of a process. for example, such calls may occur in interrupt processing, tasklet, and kernel timer. in this case, the current process should not be set to sleep, and the driver should use a gfp_atomic flag instead. the kernel normally tries to keep some idle pages for atomic allocation.
When gfp_atomic is used, kmalloc can use or even the last idle page. If the last idle page does not exist, the allocation fails.

Correct an error. Due to the limited memory size allocated by kmalloc, when a large block of continuous physical memory is required, the page allocation function (alloc_pages) discussed earlier may fail to be used, therefore, you need to request memory allocation at system startup.

Obtain a large amount of Buffer

As we have noticed in the previous chapter, it is easy to fail to allocate a large number of consecutive memory buffers. the system memory will be fragmented for a long time, and a real large memory area will often be completely unavailable. because there is often a way to do the work without using large buffering, kernel developers do not give priority to making large allocation work. before you try to get a large memory area, consider other options. the best way to perform large I/O operations till now is through the divergence/aggregation operation, which is discussed in the "divergence-aggregation ing" section in Chapter 1st.

Get a dedicated buffer at startup

If you really need a large physical buffer, the best way is to request the memory to allocate it at startup. at startup, allocation is the only way to obtain consecutive memory pages and avoid the buffer size limit imposed by _ get_free_pages. This not only allows the maximum size but also limits the size selection. allocating memory at startup is a "dirty" technique because it bypasses all memory management policies by retaining a private memory pool. this technology is not elegant and inflexible, but it is also the most difficult to fail. needless to say, a module cannot allocate memory at startup. This can be done only when the driver is directly connected to the kernel.

One obvious problem of allocation during startup is that it is not a flexible choice for common users, because this mechanism is only available for code connected to the kernel image. a device driver can be installed or replaced by this allocation method only by re-establishing the kernel and restarting the computer.

When the kernel is started, it wins access to all available physical memory of the system. it then initializes each subsystem by calling the initialization function of the subsystem, allowing the initialization code to allocate a memory buffer to itself by reducing the number of Ram resources reserved for normal system operations.

At startup, the memory is allocated by calling the following function:

# Include <Linux/bootmem. h>
Void * alloc_bootmem (unsigned long size );
Void * alloc_bootmem_low (unsigned long size );
Void * alloc_bootmem_pages (unsigned long size );
Void * alloc_bootmem_low_pages (unsigned long size );

These functions are allocated or the entire page (if they end with _ pages) or non-page-aligned memory zone. the allocated memory may be high-end memory unless a _ low version is used. if you are allocating this buffer to a device driver, you may want to use it for DMA operations, and this is not always possible for high-end memory; therefore, you may want to use a _ low variant.

The allocated memory is rarely released at startup; you will almost certainly not be able to retrieve it later, if you need it. However, there is an interface to release this memory:

Void free_bootmem (unsigned long ADDR, unsigned long size );

Note that partial pages released in this way are not returned to the system -- however, if you are using this technology, you may have allocated a large number of full pages for use.

If you need to allocate resources at startup, You Need To directly connect your driver to the kernel. For more information, see the file under Documentation/kbuild in the kernel source code.

Void * vmalloc (unsigned long size); allocates physically discontinuous virtual memory, and finally calls the _ get_free_pages function to allocate memory, which is not implemented based on slab. Allocates memory from highmem.
Input parameter description: size, the size of the allocated memory.

Vmalloc allocated addresses cannot be used outside the microprocessor, because they only make sense on the MMU of the processor. when a driver needs a real physical address (such as a DMA address, used by peripheral hardware to drive the system's bus), you cannot easily use vmalloc. the correct time to call vmalloc is when you allocate memory for a large sequential buffer that only exists in the software. note that vamlloc has more overhead than _ get_free_pages, because it must acquire memory and create a page table. therefore, it is meaningless to call vmalloc to allocate only one page. Use vmalloc in the kernel
An example function is called by the create_module system. It uses vmalloc to obtain space for the created module. the code and data of the module are copied to the allocated space and copy_from_user is used. in this way, the module seems to be loaded into the continuous memory. you can verify that the kernel symbols output by the module are located in a memory range different from those output by the kernel itself.
An example function using vmalloc in the kernel is called by the create_module system. It uses vmalloc to obtain space in the created module. the code and data of the module are copied to the allocated space and copy_from_user is used. in this way, the module seems to be loaded into the continuous memory. you can verify that the kernel symbols output by the module are located in a memory range different from those output by the kernel itself. one small disadvantage of vmalloc is that it cannot be used in atomic context.

Slab backup cache. When you need to use many data structures or variables of the same type, you can use slab to minimize the performance consumption of memory allocation and release.

A device driver often ends by repeatedly allocating many objects of the same size. if the kernel has maintained a set of memory pools of objects of the same size, why not add some special memory pools to these high-capacity objects? In fact, the kernel implements a facility to create such a memory pool, which is often called a backup cache slab. device Drivers often do not demonstrate this type of memory behavior. They prove that using a backup cache is correct, but there are exceptions. in Linux 2.6, USB and scsi drivers use the cache.

The cache manager of Linux kernel is sometimes called "Slab distributor ". therefore, its functions and types are in <Linux/slab. h>. the slab distributor implements a kmem_cache_t cache. It creates them using a call to kmem_cache_create:

Kmem_cache_t * kmem_cache_create (const char * Name, size_t size,
Size_t offset,
Unsigned long flags,
Void (* constructor) (void *, kmem_cache_t *,
Unsigned long flags), void (* destructor) (void *, kmem_cache_t *, unsigned long flags ));

This function creates a cache object that can reside in any number of memory areas of the same size. The size is specified by the size parameter. the name parameter is associated with the cache and serves as a useful management information for tracking issues. Generally, it is set as a name of the cached structure type. this cache retains a pointer to the name instead of copying it. Therefore, the driver should pass a pointer to the name in the static storage (this name is often just a text string ). this name cannot contain spaces.

Offset is the offset of the first object on the page. It can be used to ensure a special alignment of the allocated object, but you are most likely to use 0 to request the default value. flags controls how to allocate and is a bitmask of the following flag:

Slab_no_reap sets this flag to protect the cache from being reduced when the system looks for memory. setting this flag is usually a bad idea. It is important to avoid unnecessary restrictions on the freedom of operation of the memory distributor.
Slab_hwcache_align indicates that each data object needs to be aligned to a cache row. The actual alignment depends on the cache distribution of the host platform. this option can be a good choice if your cache contains frequently accessed items on the SMP machine. however, it can waste a considerable amount of memory to get the padding of cache row alignment.
The slab_cache_dma mark requires that each data object be allocated in the DMA memory zone.

There is also a set of flags used to debug cache allocation. For details, see mm/slab. C. But, often, in the system used for development, these flags are set globally using a Kernel configuration option.

The constructor and destructor parameters of the function are optional functions (but there may be no destructor, if there is no constructor). The former can be used to initialize new objects, the latter can be used to "clean" objects before their memory is released to the system as a whole. constructor and destructor are useful, but there are several restrictions that you must remember. A constructor is called when the memory of a series of objects is allocated. Because the memory may hold several objects, the constructor may be called multiple times. you cannot assume that the constructor is called as an immediate result for allocating an object. likewise, destructor may be called at an unknown time in the future,
Not immediately after an object is released. destructor and constructor may or may not be able to sleep, depending on whether they are passed the slab_ctor_atomic mark (in this case, Ctor is the abbreviation of constructor ). for convenience, a programmer can use the same function for the Destructor and constructor. The slab distributor often transmits the slab_ctor_constructor flag when the caller is a constructor.

Once an object's cache is created, you can assign an object from it by calling kmem_cache_alloc.
Void * kmem_cache_alloc (kmem_cache_t * cache, int flags );
Here, the cache parameter is the cache you have previously created; flags is the same as the one you will pass to kmalloc, and will be referenced if kmem_cache_alloc needs to go out and allocate more memory.

To release an object, use kmem_cache_free:
Void kmem_cache_free (kmem_cache_t * cache, const void * OBJ );

When the driver code runs out of this cache, typically when the module is uninstalled, it should release its cache as follows:
Int kmem_cache_destroy (kmem_cache_t * cache );
This destruction operation is successful only when all objects allocated from this cache have been returned to it. therefore, a module should check the returned values from kmem_cache_destroy; a failure indicates a class of Memory leakage in the module (because some objects have been lost .)

One benefit of using the backup cache is the statistics on the usage of the kernel maintenance buffer. These statistics can be obtained from/proc/slabinfo.

Qingsheng Shen wrote:

Function Description:
Struct pageAlloc_pages (unsigned int flags, unsigned int order); the function allocates two * consecutive physical pages * to the power of order (1 <order) in gfp_mask allocation mode *.

The assigned page can be accessed through void
Page_address (struct pagePage) function to obtain the logical address pointer corresponding to the page. If you do not need to use struct page, you can use _ get_free_pages (unsigned int flags, unsigned int order) to allocate and return a pointer pointing to the first byte in a memory area, the memory zone is several (* physically consecutive) The page is long but * not cleared *. You can also use get_zeroed_page (unsigned int flags); return a pointer to the new page and fill the page with * zero *.

The memory allocated through alloc_pages can use the page_address function to obtain the pointer of the Logical Address corresponding to the page for the low-end address. However, for high-end memory larger than MB, you need to use the kmap or kmap_atomic function to establish a ing.

Permanent ing-kmap
This function can be used in both high-end memory and low-end memory. When the memory is low-end memory, it will call the page_address function to return the logical address. When the memory is high-end memory, a permanent ing needs to be established first, then return the logical address. This function may be sleep, so it can only be used in process context.

Temporary ing (atomic ing) -- kmap_atomic
When a ing must be created and the current context cannot sleep, the kernel provides a temporary ing. There is a set of reserved mappings that can store newly created temporary mappings. The kernel can map a page in the high-end memory to a reserved ing atomically. Therefore, temporary mappings can be used in areas that cannot be sleep, for example, in the interrupt processing program, because the ing is never blocked.
It also prohibits kernel preemption, which is necessary because the ing is unique to each processor, and the scheduling may change the processor's execution process.
Temporary ing can be used in low-end memory and high-end memory. When the memory is low-end memory, it will call the page_address function to return the logical address. When it is high-end memory, it will actually use the temporary ing mechanism. Bytes ¶

Every-CPU variable is an interesting 2.6 kernel feature. when you create a per-CPU variable, each processor in the system obtains its own copy of this variable. this may look like a strange thing to do, but it has its own advantages. you do not need to lock every-CPU variable because each processor uses its own copy. each-CPU variable can also be stored in the cache of their respective processors, which improves the performance of frequently updated quantum.

A good example of each-CPU variable can be found in the network subsystem. the kernel maintains non-ending counters to track the number of received packets of each type. These counters may be updated several thousand times per second. without handling the cache and locking issues, network developers put the statistical counter into each-CPU variable. update Is lockless and fast now. in rare cases, it is easy to see the counter value in user space requests, add the version of each processor, and return the total number.

The declaration of each-CPU variable can be found in <Linux/percpu. h>. To create a per-CPU variable at the Compilation Time, use this macro definition:

Define_per_cpu (type, name );

If this variable (called name) is an array that contains dimension information of this type, each-CPU array with three integers should be created and used:

Define_per_cpu (int3, my_percpu_array );

Each-CPU variable does not need to be operated using a specific lock. remember that the 2.6 kernel can be preemptible. For a processor, it should not be preemptible in the critical section for modifying each-CPU variable. and it is not good if your process will be moved to another processor when accessing each-CPU variable. for this reason, you must explicitly use the get_cpu_var macro to access the given variable copy of the current processor, and call put_cpu_var when you finish. call get_cpu_var to return an lvalue to the variable version of the current processor and prohibit preemption. because an lvalue is returned, it can be assigned or operated directly.
For example, when a counter in a network code is incremented using these two statements:

Get_cpu_var (sockets_in_use) ++;
Put_cpu_var (sockets_in_use );

You can access the variable copy of another processor and use:

Per_cpu (variable, int cpu_id );

If you write code that makes the processor involve each other's CPU variables, you must implement a locking mechanism to secure access.

It is also possible to dynamically allocate each-CPU variable. These variables can be allocated and used:

Void * alloc_percpu (type );
Void * _ alloc_percpu (size_t size, size_t align );

In most cases, alloc_percpu does a good job; you can call _ alloc_percpu when a special alignment is required. in any case, a per-CPU variable can be returned to the system using free_percpu. you can use per_cpu_ptr to access a dynamically allocated CPU variable:

Per_cpu_ptr (void * per_cpu_var, int cpu_id );

This macro returns a pointer pointing to per_cpu_var corresponding to the given cpu_id version. if you are simply reading the version of this variable from another CPU, You can reference this pointer and use it to complete it. if, however, you are operating on the current processor version, you may need to first ensure that you cannot be removed from that processor. if all the CPU variables you access hold a spin lock, everything is fine. often, however, you need to use get_cpu to prevent preemption when using variables. therefore, the code that uses dynamic CPU variables will look like this:

Int CPU;
CPU = get_cpu ()
PTR = per_cpu_ptr (per_cpu_var, CPU );
/* Work with PTR */
Put_cpu ();

The get_cpu_var and put_cpu_var macros take care of these details when each-CPU variable is used during compilation. Dynamic each-CPU variable requires more explicit protection.

Each-CPU variable can be output to each module, but you must use a special macro version:

Export_per_cpu_symbol (per_cpu_var );
Export_per_cpu_symbol_gpl (per_cpu_var );

To get such a variable in the memory of a module, declare it and use:

Declare_per_cpu (type, name );

Declare_per_cpu usage (not define_per_cpu) tells the compiler to make an external reference.

If you want to use each-CPU variable to create a simple integer counter, see the existing implementation in <Linux/percpu_counter.h>. finally, note that some systems have a limited number of address space variables for each-CPU variable. if you create each-CPU variable in your own code, you should try to make them small.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.