Kernel that stuff. Memory management (7)---Slab (UP)

Source: Internet
Author: User

In the previous buddy system algorithm, the smallest unit allocated memory is a page (for example, 4K). This is more appropriate for large memory applications. But in real life, kernel often need to allocate a small amount of memory space, such as dozens of bytes, this time how to do?


Different people may think of different solutions.

    1. For rich and wealthy people, the method is simple: Apply for a page, only the dozens of bytes, the rest of the direct throw away.

    2. For the poor people of pinching irritate D, do not dare to squander it: apply for a page, and then record which memory area in the page is used, which is useless, useless parts can be used to meet other memory allocation requests. Be sure to make the most of it.


Unfortunately, kernel is a poor man. In order to manage small memory areas, kernel introduced the slab mechanism according to the Poor's approach.


The idea of the slab mechanism is simple, dividing the memory page into one object and then providing the memory space out of an object.

This also brings an additional benefit: slab actually acts as the cache's role. Instead of immediately returning to the buddy system, slab will put the object that was released in its own internal structure. In this case, for the new memory application, slab can then allocate the object that he left behind without having to deal with the buddy system.


Speaking so much, it is time to see what slab allocator looks like.

There are three main components in the slab mechanism:

1) Cache descriptor for storing management information;

2) Slab, consists of one or more consecutive pages, which is placed in a single object;

3) object, the managed memory unit.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6F/90/wKioL1Wg14eBpp7GAAFAEc3NUkw040.jpg "title=" slab Components.png "width=" "height=" 345 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:500px;height:345px; "alt=" Wkiol1wg14ebpp7gaafaec3nukw040.jpg "/>

Note that the same type of object is stored in a cache.



1. Cache Descriptor

Each cache is represented by a struct-body kmem_cache.

 381 STRUCT KMEM_CACHE { 382 /* 1)  per-cpu data, touched  during every alloc/free */ 383     struct array_cache  *ARRAY[NR_CPUS]; 384 /* 2)  cache tunables. protected by cache_ chain_mutex */ 385     unsigned int batchcount; 386      unsigned int limit; 387     unsigned int  shared; 388 389     unsigned int buffer_size; 390      U32 RECIPROCAL_BUFFER_SIZE; 391 /* 3)  touched by  every alloc & free from the backend */ 392 393      unsigned int flags;     /* constant flags  */ 394     unsigned int num;       /* # of  OBJS PER SLAB */ 395 396 /* 4)  cache_grow/shrink */ 397      /* order of pgs per slab  (2^n)  */ 398      unsigned int gfporder; 399 400      /* force gfp flags, e.g. gfp_dma */ 401     gfp_t  gfpflags; 402 403     size_t colour;           /* cache colouring range */ 404      unsigned int colour_off;    /* colour offset */  405     struct kmem_cache *slabp_cache; 406      unsigned int slab_size; 407     unsigned int dflags;         /* dynamic flags */ 408 409     /*  constructor func */ 410     void  (*ctor) (Struct kmem_cache  *, void *),  411 412 /* 5)  cache creation/removal */  413     const char *name; 414     struct  LIST_HEAD NEXT; 415 416 /* 6)  statistics */          ...          450      struct kmem_list3 *nodelists[MAX_NUMNODES]; 454 };

As can be seen from the comments in the source code, the contents of this struct are divided into six parts. The sixth part, is the statistical information related to debugging, omitted to say.


The first two sections are information related to PER-CPU.

    • Array: This is the object cache for PER-CPU. The principle and the PER-CPU page frame cache that we talked about earlier are the same, proactive mechanism.

    • Batchcount: The number of objects requested and disposed each time the object cache is populated and scaled down.

    • Limit: The size of each Per-cpu object cache, which is the maximum number of objects that can be stored.

    • Shared: In each cache, in addition to the Per-cpu object cache prepared for each CPU, there is an object cache shared by all CPUs. The size of the cache (that is, its limit value) is (Cachep->shared*cachep->batchcount).

    • Buffer_size: As mentioned earlier, a cache is stored in the same type of object, since it is the same type, and its size is certainly the same. Buffer_size is specifying the size of each object in the cache.


Part III and IV are information related to slab management.

    • Flags: Defines some of the global properties of the cache, such as Cflgs_off_slab, which determines the location of the SLAB descriptor in the cache.

    • Num: The number of object contained in a slab.

    • Gfporder: Specifies the number of consecutive pages that a slab contains, that is, a slab contains 2^gfporder contiguous pages.

    • Gfpflags:slab contains pages that are also assigned from the buddy system. GFPFLAGS Specifies the GFP flag that is used when requesting a page for slab.

    • Colour, Colour_off, and Colour_next in KMEM_LIST3: These three members are used in the slab coloring mechanism.

    • The Slabp_cache:slab descriptor has two possible storage locations: internal or external. Internal is to say that the slab descriptor and slab contain the object, which is stored in the slab page; external is that the slab descriptor is stored outside the slab page. The Slabp_cache here specifies the location where the external slab descriptor is stored.

    • The amount of space occupied by the Slab_size:slab descriptor plus the object descriptor.

    • The Ctor:object constructor. When a new slab is created, it means that a new Num object is created, and the constructor is executed for each object.


Part V is the global information of the cache.

    • Name:cache's name. The name will appear in the/proc/slabinfo.

    • Next: All caches in the system are linked to the linked list Cache_chain.



At the end of the struct, is the member NodeLists, whose type is a pointer array of struct KMEM_LIST3. All the slab contained in a cache are organized in this.

 287 /* 288  * the slab lists for all objects. 289   */ 290 struct kmem_list3 { 291     struct  list_head slabs_partial; /* partial list first, better asm code * / 292     struct list_head slabs_full; 293      struct list_head slabs_free; 294     unsigned long  free_objects; 295     unsigned int free_limit; 296      unsigned int colour_next;   /* Per-node cache  coloring */ 297     spinlock_t list_lock; 298      struct array_cache *shared; /* shared per node */ 299      struct array_cache **alien; /* on other nodes */ 300      unsigned long next_reap;    /* updated without  locking */ 301     int free_touched;        /* updated without locking */ 302 };
    • There are three possible states for a slab: full, free, and partial. These three types of slab were organized in three linked lists: Slabs_full, Slabs_free, slabs_partial.

    • Free_objects: The total number of idle object in all slab.

    • Free_limit: The total number of idle objects in all slab must not exceed free_limit, that is, free_objects > Free_limit are not allowed.

    • Shared: Each CPU has one of its own object caches. There is another object cache, but it is shared by all CPUs.

    • Next_reap, free_touched: These two are used by the memory recycling mechanism and are not spoken for the time being.



2. Slab Descriptor

Each slab in the cache is represented by the struct slab.

 221 struct slab { 222      struct list_head list; 223     unsigned long colouroff ;  224     void *s_mem;        /*  including colour offset */ 225     unsigned int  inuse; /* num of objs active in slab */ 226      kmem_bufctl_t free; 227     unsigned short nodeid;  228 }; 
    • List: Depending on the type of slab, the slab is hung on the list slabs_full, Slabs_free, or slabs_partial in the KMEM_LIST3 through the lists.

    • Colouroff: The first object in the slab is offset within the page.

    • S_mem: The address of the first object in the slab.

    • InUse: How many objects in this slab have been allocated.

    • Free: The index of the next idle object in the slab. If there is no idle object, the value is bufctl_end.



3. Object Descriptor

Each object also has a descriptor, type kmem_bufctl_t. This descriptor is much simpler because it is actually an unsigned integer.

208 typedef unsigned int kmem_bufctl_t;


All object descriptors are stored next to the slab descriptor, which is always glued together. So, like the slab descriptor, the object descriptor has two possible storage locations:

    • External Object descriptor: Stored outside the Slab page, the location is indicated by the Slabp_cache in the cache descriptor.

    • Internal Object descriptor: Stored within the Slab page.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/90/wKioL1Wg1-DwLn4PAAO5szigrX4850.jpg "title=" slab Descriptor.png "width=" "height=" 490 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:700px;height:490px; "alt=" Wkiol1wg1-dwln4paao5szigrx4850.jpg "/>


The K-Object descriptor describes the K-object and only makes sense if the object is idle, at which point it contains the index of the next idle object in the slab. Thus, all the idle objects in the same slab form a simple linked list. The last object descriptor value in the list is Bufctl_end, which represents the end of the chain.



4. Object Cache

In front of the buddy system, we talked about a "rainy day" mechanism. It was said at the time that this was a kernel trick, and this is not where we saw the trick.

The trick is simple: allocate some resources ahead of time and put them in a per-cpu cache. This can reduce the competition between different CPUs for locks, and can also reduce the operation of various lists in the slab.


The object cache is represented by the struct Array_cache.

Array_cache {265 unsigned int avail; 266 unsigned int limit; 267 unsigned int batchcount; 268 unsigned int touched; 269 spinlock_t lock;   *entry[void]; 275};
    • Avail: The number of objects available in the object cache. It also serves as the index to the first empty slot in the cache, which is a bit like the top pointer in the stack. So the object cache is a LIFO data structure.

    • Limit: The size of the object cache.

    • Batchcount: Chunck size when filling or indenting the cache.

    • Touched: If the object cache has been used recently, the touched is set to 1.

    • Entry: This is a dummy array. All objects in object cache are stored immediately following the descriptor.


Just like the slab descriptor and the object descriptor, the object cache descriptor and its contained objects are all sticky together, and these objects are immediately following the descriptor, whose address is indicated by the entry in the descriptor.

Note that the object here is actually the address of object.


As I said earlier, each CPU has an object cache, which is placed in the member variable array of the cache descriptor. In addition, there is an object cache shared by all CPUs, placed in the member variable shared by the Kmem_list3 struct.

The existence of this shared cache makes the task of moving objects from one CPU's object cache to the other CPU's object cache much easier.


So far, we have finished all the structures used by the slab mechanism, and the interrelationships of these structures are about this.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/90/wKioL1Wg4J6R90z_AAHup9BvRTk272.jpg "title=" slab Structure.png "width=" "height=" 470 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:700px;height:470px; "alt=" Wkiol1wg4j6r90z_aahup9bvrtk272.jpg "/>



5. Slab Coloring

If you do not consider slab coloring, a cache, because all the object size is the same, so different slab in the same index of object, its offset is the same. This is not a problem, but if you consider entering hardware cache, objects with the same offset is likely to be placed in the same cache line, which affects performance.

Slab coloring was introduced in order to solve this problem. The idea is: at the beginning of each slab, insert a little space of varying size, so that the same index in each slab object will have a different offset. And this space that is inserted is called the color of the slab.


And where does that space come from? A slab, is divided into an object, it is likely to leave a little space, these spaces are not enough to separate an object, so can only be wasted. Just like you take a piece of cloth to make clothes, and then a good tailor, will always leave a bit of cloth head, scraps of what.

But kernel even this scraps are not spared, divide them into different color for slab use. Visible ' make the best of ' these four words, kernel is really the ultimate.


Let's take a look at how object is stored in the cache. Assume that all objects in a cache are aligned, meaning that the memory address of all objects is an integer multiple of a number (assumed to be AlN).


We define a few variables:

    • Num: The number of object in a slab.

    • Osize: The size of each object.

    • The size of the total space occupied by the Dsize:slab descriptor and the object descriptor. If it is external's slab, the value is 0.

    • Free: The size of the space that is not used in the slab, that is, the size of the scraps. Free must be less than osize, otherwise the slab will be able to arrange an object again. But free can be bigger than ALN.


Then the size of a slab can be expressed as:

Slab length = (num *


The so-called slab coloring, in fact, is part of the free space, from the tail of slab to slab head, but also to meet the alignment requirements. So the number of color available is (FREE/ALN). The member variable in the cache descriptor is colour, which is exactly what this value is stored in.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/94/wKiom1Wg-sXy9FywAAG6pQicLZI772.jpg "title=" slab Coloring.png "width=" "height=" 261 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" width:800px;height:261px; "alt=" Wkiom1wg-sxy9fywaag6pqiclzi772.jpg "/>


These color distributions are distributed evenly between different slab. The next color value to use is saved in the member variable colour_next of the struct kmem_list3. When a new slab is created, the new slab uses colour_next as its own color, and then the Colour_next is incremented by 1. If the maximum value is increased to cachep->colour, then Colour_next becomes 0 and begins again. This ensures that each slab and its previous slab use a different color.


In addition, the value of AlN is stored in the member variable Colour_off in the cache descriptor, while the Colouroff in the slab descriptor holds the value (color * aln) + dsize, which is the offset of the first object.


This article is from the "Kernel blogs" blog, so be sure to keep this source http://richardguo.blog.51cto.com/9343720/1673269

Kernel that stuff. Memory management (7)---Slab (UP)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.