LinuxSLUB distributor details

Last Update:2017-10-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article Title: LinuxSLUB distributor details. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.

Kernel Object Buffer Management

During Linux kernel running, some kernel data structures (objects) are often used ). For example, when a thread of a process opens a file for the first time, the kernel needs to allocate a data structure called file to the file. When the file is eventually closed, the kernel must release the file data structure associated with this file. These small block buckets are not used only within a kernel function; otherwise, the kernel stack space of the current thread can be used. At the same time, these small pieces of storage space are dynamically changing. It is impossible to form a queue with a static length, just like the page Structure Used by the physical memory page management. As the kernel cannot predict the buffer needs of different kernel objects during running, it is not suitable to create a buffer pool for each possible object ", in this case, some buffer pools are exhausted, while some buffer pools are idle. Therefore, the kernel can only adopt a more global approach.

We can see that the management of kernel objects is similar to the heap Management in user processes. The core problems are: How to efficiently manage memory space, this allows you to quickly allocate and recycle objects and reduce memory fragments. However, the kernel cannot simply use the heap-based Memory Allocation Algorithm of the user process, because the kernel has the following special features for its object usage:

The kernel uses a wide variety of objects and should adopt a unified and efficient management method.

The kernel frequently uses certain objects (such as task_struct, therefore, user process heap management commonly uses search-based allocation algorithms such as First-Fit (the First memory block found in the heap that meets the request) and Best-Fit (using the most suitable memory block in the heap to meet the request) are not directly applicable, but should adopt a buffer mechanism.

A considerable number of kernel objects require some special initialization (such as the queue header) rather than simply clearing to 0. If the released objects can be fully reused so that no Initialization is required for the next allocation, the kernel running efficiency can be improved.

The impact of the splitter on the hardware cache must be fully considered for the organization and management of the kernel object buffer.

With the popularization of multi-processor systems with shared memory, it is often the case that multiple processors allocate a certain type of objects at the same time. Therefore, the distributor should avoid the overhead of synchronization between processors as much as possible, some Lock-Free algorithm should be used.

How to effectively manage the buffer space is a hot research topic for a long time. In the early 1990s S, the Solaris 2.4 operating system adopted a buffer distribution and management method called "slab" (originally meant to be a mass concrete, to a considerable extent, it meets the special requirements of the kernel.

--------------------------------------------------------------------------------

SLAB distributor Introduction

For many years, the Linux kernel has used a Kernel Object Buffer distributor called SLAB. The SLAB splitter is derived from the Solaris 2.4 allocation algorithm and works on the physical memory page box distributor to manage the cache of objects of specific sizes for fast and efficient memory allocation.

The SLAB splitter creates a separate buffer for each kernel object used. The Linux kernel uses the Buddy System to manage the physical memory page. Therefore, the SLAB distributor works directly on the partner System. Each buffer is composed of multiple slab, and each slab is a set of consecutive physical memory page boxes, which are divided into a fixed number of objects. Depending on the object size, by default, an slab can contain up to 1024 physical memory page boxes. Due to alignment and other requirements, the memory allocated to objects in slab may be larger than the actual size of the objects required by users, which may cause a certain amount of memory waste.

The buffer zone has its own constructor and destructor because kernel objects may need some special processing before and after use )", similar to the concepts in C ++ and other object-oriented programming languages (however, the latest SLAB distributor removes the destructor ). When a new slab is created, the kernel calls the constructor to initialize each object. When slab is released, the kernel calls the destructor. This is why the kernel data structure is called an object.

The kernel uses the kmem_cache data structure to manage the buffer zone. Because kmem_cache itself is also a kernel object, a dedicated buffer zone is required. The kmem_cache control structure of all buffers is organized into a bidirectional cyclic queue with cache_chain as the queue header, and the global variable cache_cache points to the kmem_cache object in the kmem_cache object buffer zone. Each slab requires a descriptor data structure of the struct slab type to manage its status, and a kmem_bufctl_t (defined as an unsigned integer) structure array to manage idle objects. If the size of an object cannot exceed 1/8 physical memory page boxes, these slab management structures are directly stored inside slab and located at the starting position of the first physical memory page box allocated to slab; otherwise, it is stored outside slab and located in the universal object buffer allocated by kmalloc.

Objects in slab have two states: allocated or idle. To effectively manage slab, slab can be dynamically in the queue corresponding to the buffer according to the number of allocated objects:

Full queue. In this case, no idle object exists in the slab.

Partial queue. In this case, the slab contains both allocated objects and idle objects.

Empty queue. All idle objects in the slab are displayed.

In the NUMA (Non Uniform Memory Access) system, each node has these three slab queues. The struct kmem_list3 structure is used to maintain related queues. The SLAB distributor first allocates objects from the slab in the Partial queue. When the last allocated object of slab is released, the slab will be transferred from the Partial queue to the Empty queue. When the last idle object of slab is assigned, the slab is transferred from the Partial queue to the Full queue. When the total number of idle objects in the buffer is insufficient, more slab will be allocated. However, if the number of idle objects is large, some slab of the Empty queue will be recycled regularly.

To make full use of the hardware high-speed cache, the SLAB distributor allows objects to be aligned in the first-level hardware high-speed cache (SLAB_HWCACHE_ALIGN flag is set when a buffer is created), and the color policy is used, the IP addresses of objects of the same serial number in different slab in the same buffer zone are staggered to avoid performance loss caused by frequent switch-in/out because they are placed in the same physical cache line.

To support simultaneous object allocation by multiple processors, the buffer maintains a local cache for each processor. The processor directly allocates objects from the local cache to avoid lock usage. When the local cache is empty, objects are batch allocated from slab to the local cache.

--------------------------------------------------------------------------------

Design Principle of SLUB distributor

The SLAB splitter has been at the core of the Linux kernel memory management for many years. kernel hackers generally do not want to change its code because it is very complicated, and in most cases, it's done quite well. However, with the wide application of large-scale multi-processor systems and NUMA systems, SLAB splitters gradually expose their serious shortcomings:

More complex queue management. There are many queues in the SLAB distributor, such as the local object cache queue for the processor, and the idle object queue in slab. Each slab is in a queue in a specific State, even the buffer control structure is in a queue. Managing these different queues effectively is a laborious and complex task.

The storage overhead of slab to manage data and queues is relatively large. Each slab requires a struct slab data structure and an array of kmem_bufctl_t (4-byte unsigned integers) that manages all idle objects. When the object size is small, the kmem_bufctl_t array will cause a large overhead (for example, if the object size is 32 bytes, 1/8 of the space will be wasted ). In order to ensure that the object uses the coloring policy in the hardware cache, additional memory must be wasted. At the same time, the Buffer Queue for nodes and processors will also waste a lot of memory. Tests show that in a large-scale NUMA system with 1000 nodes/processor, several GB of memory is used to maintain the reference of queues and objects.

Memory collection in the buffer zone is complicated.

The support for NUMA is very complex. SLAB's support for NUMA is based on the physical page box distributor, so objects cannot be used in fine granularity. Therefore, it cannot ensure that the processor-level cached objects come from the same node.

Redundant Partial queue. The SLAB distributor has a Partial queue for each node. Over time, a large number of Partial slab will be generated, which is not conducive to the reasonable use of memory.

Performance Tuning is difficult. The parameters that can be adjusted for each slab are complex, and the spin lock is required when the processor is allocated to the local cache.

Debugging is difficult to use.

To solve the above SLAB distributor deficiencies, kernel developer Christoph Lameter introduced a new solution in Linux kernel 2.6.22: SLUB distributor. SLUB splitters simplify the design concept while retaining the basic idea of SLAB splitters: Each buffer zone consists of multiple small slab, and each slab contains a fixed number of objects. The SLUB splitter simplifies the management data structures such as kmem_cache and slab, abandons the numerous queuing concepts in the SLAB splitter, and optimizes the multi-processor and NUMA systems, this improves performance and scalability and reduces memory waste. To ensure that other kernel modules can be seamlessly migrated to the SLUB distributor, SLUB retains all the interface API functions of the original SLAB distributor.

The data structure and Source Code listed in this article are all taken from Linux kernel 2.6.25.

Each kernel object buffer is described by the kmem_cache data structure. Table 1 lists its fields (excluding the fields related to statistics and debugging ).

Table 1. kmem_cache Data Structure

Type Name Description

Unsigned long flags a set of flag describing the buffer attribute

Int size the memory size allocated to the object (which may be greater than the actual size of the object)

Actual size of the int objsize object

Int offset stores the displacement of idle object pointers

Int order indicates that a slab requires 2 ^ order physical page boxes.

Slab information of the node on which kmem_cache_node creates the buffer

Int objects total number of objects in an slab

Gfp_t allocflags an additional set of labels used when creating an slab

Int refcount buffer counter. When a user requests to create a new buffer, the SLUB distributor will reuse the created buffer with similar sizes to reduce the number of buffers.

Void (*)(...) Ctor is used to initialize the constructor of each object when slab is created.

Int inuse metadata displacement

Int align alignment

Const char * name Buffer name

Struct list_head list contains two-way cyclic Queues with all buffer description structures. The queue header is slab_caches.

Int remote_node_defrag_ratio the smaller the value, the more inclined it is to allocate objects from the current node.

Struct kmem_cache_node * [] Data Structure of slab information created by node for each node (except for nodes that create a buffer, use the local_node field)

Struct kmem_cache_cpu * [] Data Structure of the slab information created by cpu_slab for each processor

We can see that the kmem_cache structure of the SLUB distributor is much simpler than that of SLAB, and there is no queue related fields. It is worth noting that the SLUB splitter has the buffer Merge function: When the kernel executes the thread request to create a new buffer C2, the SLUB splitter first searches for the created buffer, if the object size of C1 in a buffer zone is slightly greater than that of C2, C1 is reused. Tests show that this function reduces the number of buffers by about 50%, thus reducing slab fragments and improving memory utilization.

In the SLUB distributor, an slab is a set of consecutive physical memory page boxes, which are divided into a fixed number of objects. Slab does not have any additional idle object Queue (which is different from SLAB), but instead reused the space of idle objects. Slab does not have any additional description structure, because SLUB adds the union fields of freelist, inuse, and slab to the page structure representing the physical page box, representing the pointer of the first idle object, respectively, the number of allocated objects and the pointer of the kmem_cache structure in the buffer zone, so the page Structure of the first physical page box of slab can be described.

Each processor has a local active slab, which is described by the kmem_cache_cpu structure. Table 2 lists its fields (excluding statistics-related fields ).

Table 2. kmem_cache_cpu Data Structure

Type Name Description

Void ** freelist pointer of the idle object queue, that is, pointer of the first idle object

Struct page * page slab's first physical page box Descriptor

The NUMA node number of the int node processor. The value-1 is used for debugging.

Unsigned int offset is used to store the displacement of the pointer of the next idle object, in word

The actual size of the unsigned int objsize object, which is consistent with the objsize field in the kmem_cache structure.

In SLUB, there is no separate Empty slab queue. Each NUMA node maintains a Partial slab queue using the kmem_cache_node structure. Table 3 lists its fields (debugging-related fields are omitted ).

Table 3. kmem_cache_node Data Structure

Type Name Description

Spinlock_t list_lock protects the spin locks of nr_partial and partial Fields

Unsigned long nr_partial Number of the current node Partial slab

Total number of slab nodes on atomic_long_t nr_slabs

Bidirectional cyclic queue of struct list_head partial Partial slab

When a processor activity slab is created, the pointer of the first idle object is copied to the freelist field in the kmem_cache_cpu structure. Although the object allocation and release operations are only for the active slab of the processor, however, in some special cases, a new active slab will be created for the current processor and the unused active slab will be added to the Partial queue of the NUMA node (for example, A kernel execution thread application object running on processor A, but no idle object exists in the active slab of A, so A new slab must be created. However, the slab creation operation may cause sleep, so after the creation operation is complete, the thread may be scheduled to processor B, which will stop using the original slab of B, and add it to the Partial queue of the node B ). Compared with SLAB, the number of slab instances in the Partial state is relatively small, so the memory is used reasonably and effectively. When the local slab does not have idle objects, the SLUB distributor first allocates an slab from the Partial queue of the node where the processor is located as the new local active slab, and then allocates slab from other nodes.

When the kernel execution thread applies for an object, obtain the address of the first idle object from the freelist field of the kmem_cache_cpu structure of the processor, update the freelist field, and point it to the next idle object. When releasing an object, if the object belongs to the active slab of the processor, add it directly to the first line of the idle object queue and update the freelist field. Otherwise, the object must belong to a Partial slab. If the release operation changes the Partial slab to the Empty state, the slab is released. It can be seen that the SLUB splitter does not need a complex buffer memory recovery mechanism.

The debugging code of SLUB is always available. Once the "slab_debug" option is activated, you can easily select a single or a group of specified buffers for dynamic debugging.

Kernel functions often need to temporarily allocate a random size of physical address contiguous memory space. If requests are not frequent, there is no need to create a separate buffer. The Linux Kernel provides a set of common object buffers of a specific size for such requests. Call the kmalloc function to obtain the memory space that meets the request size. Call kfree to release the memory space. Kmalloc works on the SLUB distributor. During kernel initialization, a buffer group of 13 common objects is created. The kmalloc_caches array stores the kmem_cache data structure of these buffers. Because the kmem_cache data structure is allocated through kmalloc, the buffer zone of common objects can only be described using the statically allocated kmem_cache structure array. Kmalloc_caches [0] indicates that the buffer is specially allocated with the kmem_cache_node structure. The size of the kmalloc_caches [1] buffer object is 64, the size of the kmalloc_caches [2] buffer object is 192, and the size of the other buffer objects in I (3-12) is 2 ^ I. If the size of the requested object exceeds the size of the physical page, call the page box distributor. To meet the needs of older ISA devices, the kernel also uses DMA memory to create a buffer for 13 common objects and uses the kmalloc_caches_dma array to store the corresponding kmem_cache structure.

[1] [2] [3] Next page

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

LinuxSLUB distributor details

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

LinuxSLUB distributor details

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support