I understand the memory allocation algorithm (a)

Last Update:2015-03-06 Source: Internet

Author: User

Tags cas compact memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memory allocation is essentially a space management algorithm that gives you a continuous space to provide storage services, so what are your space management and allocation algorithms to be more efficient?

Bump-the-pointer

Bump-the-pointer is the simplest algorithm. The MM white paper of the hotspot is so descriptive of BTP,

That's, the end of the previously allocated object is always kept track of. When a new allocation request needs to being satisfied, all this needs to being done was to check whether the object would fit in The remaining part of the generation and, if so, to update the pointer and initialize the object.

We only need a pointer to record the starting address of the available space, receive the allocation request, check that the free space is sufficient to allocate the request, allocate space if enough, and update the pointer, and if not enough, it needs to be garbage collected. Let's take a look at how hotspot's Serialcollector uses BTP.

Serialcollector uses the MARK-SWEEP-COMPACT algorithm to perform GC for Oldgen, and for Oldgen allocation, simple bump the pointer, if the remaining free space is not enough, Serialcollector need to Stop-the-world, then execute mark-sweep-compact, that is, mark the Living object, clear the Dead object, finally do sliding compaction, the living objects all compressed to one side, This allows the remaining free space to continue to simply allocate space using the BTP algorithm.

BTP a few more points,

For multi-threaded environments, thread-local Allocation buffers (tlabs) can be used.
The point at which the GC is triggered is optimized, and it does not have to wait until there is a request for allocation and the free space is insufficient to perform the GC;
Stop-the-world is a mishap;

Slab Allocator

Now let's look at another management approach that uses a divide-and-conquer strategy. We can divide the continuous space into groups, each group also contains a number of pits, each pit in the same group of space size is the same, as for the size of the pit can be adjusted according to the use of the situation.

You only need to maintain the metadata of each group so that each time you have an allocation request, you can do best Fit by looking at metadata first.

Slab allocator is working in this way. Compared with bumb-the-pointer, slab does not need stop-the-world to do GC, but it will cause a certain amount of space waste, because we can only do best Fit.

Memcached Slab

Talk is cheap, let's look at the slab implementation in memcached. First look at the definition of Slabclass and the method of initialization,

typedef struct{unsigned intSize/ * Sizes of items * /    unsigned intPerslab;/ * How many items per slab * /    void*slots;/ * List of item Ptrs * /    unsigned intSl_curr;/* Total free items in list * /    unsigned intslabs;/ * How many slabs were allocated for this class * /    void**slab_list;/ * Array of slab pointers * /    unsigned intList_size;/ * Size of prev array * /    unsigned intKilling;/ * index+1 of dying slab, or zero if none * /size_t requested;/* The number of requested bytes * /} slabclass_t;

voidSlabs_init (Constsize_t limit,Const DoubleFactorConst BOOLPrealloc) {inti = power_smallest-1;unsigned intSize =sizeof(item) + Settings.chunk_size; Mem_limit = limit;if(Prealloc) {/ * Allocate everything in a big chunk with malloc * /Mem_base =malloc(Mem_limit);if(Mem_base! = NULL)            {mem_current = mem_base;        Mem_avail = Mem_limit; }Else{fprintf(stderr,"warning:failed to allocate requested memory in"                    "One large Chunk.\nwill allocate in smaller chunks\n"); }    }memset(Slabclass,0,sizeof(Slabclass)); while(++i < power_largest && size <= settings.item_size_max/factor) {/ * Make sure items is always n-byte aligned * /        if(size% chunk_align_bytes) size + = chunk_align_bytes-(size% chunk_align_bytes);        slabclass[i].size = size;        Slabclass[i].perslab = settings.item_size_max/slabclass[i].size; Size *= factor;if(Settings.verbose >1) {fprintf(stderr,"Slab class%3d:chunk size%9u perslab%7u\n", I, slabclass[i].size, Slabclass[i].perslab);    }} power_largest = i;    Slabclass[power_largest].size = Settings.item_size_max; Slabclass[power_largest].perslab =1;if(Settings.verbose >1) {fprintf(stderr,"Slab class%3d:chunk size%9u perslab%7u\n", I, slabclass[i].size, Slabclass[i].perslab); }/* For the test suite:faking of much we ve already malloc ' d * /{Char*t_initial_malloc = getenv ("T_memd_initial_malloc");if(T_initial_malloc)        {mem_malloced = (size_t) atol (T_initial_malloc); }    }if(Prealloc)    {slabs_preallocate (power_largest); }}

slabs_initMethod can be seen above the pit size of the tuning, memcached is through the configuration of one factor to achieve, pit bit size is increased by factor times.

Note that the size here is sizeof(item) + settings.chunk_size that item is the data stored by memcached and is defined in memcache.h. The data structure design of storage system is another topic that can be probed into, and then discussed later.

typedef struct_stritem {struct_stritem *next;struct_stritem *prev;struct_stritem *h_next;/ * Hash Chain Next * /rel_time_t time;/ * Least recent access * *rel_time_t Exptime;/ * Expire time * /    intNbytes;/ * Size of data * /    unsigned  ShortRefCount; uint8_t Nsuffix;/ * Length of flags-and-length string * /uint8_t It_flags;/ * item_* above * /uint8_t Slabs_clsid;/ * Which slab class we ' re in * /uint8_t Nkey;/ * Key length, w/terminating null and padding * /    / * This odd type prevents type-punning issues if we do * the little shuffle to save space when not using CAS. */     Union{uint64_t CAs;CharEnd } data[];/ * If It_flags & Item_cas we have 8 bytes CAS * /    / * Then null-terminated key * /    /* Then "flags length\r\n" (no terminating null) */    /* then data with terminating \ r \ n (no terminating null; it ' s binary!) */} item;

Each slabclass has a set of slab, each slab split into slabclass.perslab a slabclass.size pit of size.

Static intGrow_slab_list (Const unsigned intID) {slabclass_t *p = &slabclass[id];if(P->slabs = = p->list_size) {size_t new_size = (p->list_size! =0) ? P->list_size *2: -;void*new_list = ReAlloc (p->slab_list, new_size *sizeof(void*));if(New_list = =0)return 0;        P->list_size = new_size;    P->slab_list = new_list; }return 1;}Static voidSplit_slab_page_into_freelist (Char*ptr,Const unsigned intID) {slabclass_t *p = &slabclass[id];intX for(x =0; X < p->perslab; X + +) {Do_slabs_free (PTR,0, id);    PTR + = p->size; }}Static intDo_slabs_newslab (Const unsigned intID) {slabclass_t *p = &slabclass[id];intLen = settings.slab_reassign? Settings.item_size_max:p->size * p->perslab;Char*ptr;if((mem_limit && mem_malloced + len > mem_limit && p->slabs >0) || (grow_slab_list (id) = =0) || (ptr = Memory_allocate ((size_t) len) = =0) {memcached_slabs_slabclass_allocate_failed (ID);return 0; }memset(PTR,0, (size_t) len);    Split_slab_page_into_freelist (PTR, id);    p->slab_list[p->slabs++] = ptr;    mem_malloced + = Len; Memcached_slabs_slabclass_allocate (ID);return 1;}

Using Freelist, which is slabclass.slots to manage the available pit bits,

Static voidDo_slabs_free (void*ptr,Constsize_t size,unsigned intID) {slabclass_t *p;    Item *it; ASSERT ((item *) ptr)->slabs_clsid = =0); ASSERT (ID >= power_smallest && ID <= power_largest);if(ID < power_smallest | | ID > power_largest)return;    Memcached_slabs_free (size, id, PTR);    p = &slabclass[id];    it = (item *) PTR;    It->it_flags |= item_slabbed; It->prev =0; It->next = p->slots;if(it->next) It->next->prev = it;    P->slots = it;    p->sl_curr++; p->requested-= size;return;}

Use a simple mutex to cope with the concurrency situation. (Why not lock slabclass but need to lock the entire slab Allocator directly?) ）

voidunsignedint id) {    void *ret;    pthread_mutex_lock(&slabs_lock);    ret = do_slabs_alloc(size, id);    pthread_mutex_unlock(&slabs_lock);    return ret;}

Finally, the above is said to be the logical structure, the following paste my physical structure to facilitate understanding, which slabclass[0].perslab=3 .

 mem_base slabclass[0 ].slots  |       | +---+---+---+---+---+---+---+---+---+---+       | x  | x  | x  |   x  |   |   | |   x  |   |       |           +---+---+---+---+---+---+---+---+---+---+       |               |  | Slabclass[0 ] slabclass[1 ] Slabclass[0 ] .slab  _list[0 ]   Span class= "Hljs-preprocessor" >.slab  _list[0 ] .slab  _list[1 ]

Buddy Allocator

Buddy Allocator adopted a strategy similar to slab, I understand it is only a special kind of slab Allocator, pit bit size is 2 power, which leads to Buddy space waste is much more serious than slab, For example, a 33K allocation request will be allocated to 64K of space. In the final analysis, we still need to adjust the pit size according to the actual use situation, and reduce the waste of space.

Resources

Understanding the memory storage of memcached
Http://en.wikipedia.org/wiki/Slab_allocation
Http://en.wikipedia.org/wiki/Buddy_memory_allocation
http://www.ibm.com/developerworks/cn/linux/l-linux-slab-allocator/

I understand the memory allocation algorithm (a)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More