This is a creation in Article, where the information may have evolved or changed.
A few days ago, I wrote 3 articles about the Go language memory dispenser, namely go language memory dispenser design, go language memory allocator-fixalloc, go language memory allocator-mspan, these 3 are mainly the foreplay of this article, in fact, all the content can be written in an article, But the content is too much, no energy to take a breath to fix. This article will be the entire memory allocator architecture and the core components of the detailed introduction, of course, in person to check the code is the kingly way.
Memory layout structure diagram
I have mapped the entire core code logic to the abstraction of this memory layout, which basically shows the overall structure of the go language memory allocator as well as some details (this structure should also be applied to tcmalloc). From this knot composition, memory allocator is still a little complex, but according to the specific logic level can be split into three large modules--cache,central,heap, and then one of the module analysis down, logic is particularly clear. At the bottom of the structure diagram Cache is the cache module part, the central module corresponds to the dark blue part MCentral , the logical structure of the center module is very simple, so the structure diagram is not detailed drawing, Heap is the structure diagram of the core structure, corresponding to the heap module, It can also be seen that Central is managed directly by the heap, which belongs to the heap's submodule.
In the analysis of the memory allocator this part of the source code, the first need to be clear is all the memory allocation of the entrance, with the entrance can be from here as a starting point of a line to see, there will not be too much obstacle. This entry is the function in the MALLOC.GOC source file runtime·mallocgc , the main task of this entry function is to allocate memory and trigger the GC (this article will only describe memory allocation), before entering the real allocated memory, this entry function will also determine whether the request is small memory allocation or large memory allocation ( 32k as the dividing line); The small memory allocation takes the calling runtime·MCache_Alloc function from the cache, while the large memory allocation call is runtime·MHeap_Alloc fetched directly from the heap. After the entry function, it will actually go into the specific memory allocation process.
Before actually going into the memory allocation process, you need to know how the entire memory allocator was created and what it was initialized to look like. Complete the function of the memory allocator creation initialization is runtime·mallocinit , look at the simplified source code:
voidruntime mallocinit (void) {///creates Mheap object, which allocates memory directly from the operating system. The heap is global, all threads are shared, and a go process has only one heap. if ((Runtime mheap = Runtime Sysalloc (sizeof (*runtime mheap))) = = nil) runtime throw ("Runtime:cannot allocate heap metadata");//64-bit platform, Request a chunk of memory address reservation, all subsequent page requests will be allocated from this address area. This area is the arena in the structure diagram. if (sizeof (void*) = = 8 && (limit = = 0 | | limit > (1<<30))) {arena_size = Maxmem;bitmap_size = arena_size/ (sizeof (void*) *8/4);p = Runtime Sysreserve ((void*) (0x00c0ull<<32), bitmap_size + arena_size);} Initialize the arena and bitmap of the heap. Runtime Mheap->bitmap = p;runtime Mheap->arena_start = p + bitmap_size;runtime mheap->arena_used = runtime· Mheap->arena_start;runtime mheap->arena_end = Runtime Mheap->arena_start + arena_size;//Initializes other internal structures of the heap, such as: Spanalloc, Cacachealloc are fixalloc initialization, the free, large fields are mounted to maintain the span of the two-way circular linked list. Runtime Mheap_init (runtime mheap, runtime SYSALLOC);//From the cachealloc of the heap to the allocation Mcache, hangs on a thread. M->mcache = Runtime Allocmcache ();}
The initialization process is mainly in tossing Mcache and mheap two parts, and mcentral in the actual logic is a sub-module of MHEAP, so the initialization process is not clearly reflected, this and I draw the structure of the map by the two large structure of the corresponding. The heap is shared by all underlying threads, and the cache is owned by each thread and is exclusive. On a 64-bit platform, the heap requests a memory address reservation from the operating system only 136G, where bitmap requires 8G space, so the real memory that can be applied is 128G. Of course, 128G in most cases is sufficient, but I know that there are individual special applications of single-machine memory is more than 128G.
The following is a detailed description of the process path for small memory allocations, from the cache to central to the heap.
Cache
The implementation of the cache is mainly in the mcache.c source file, the structure Mcache defined in Malloc.h, the function prototype that requests memory from the cache:
void *runtime·MCache_Alloc(MCache *c, int32 sizeclass, uintptr size, int32 zeroed)
parameter is the amount of memory that needs to be size requested, it is necessary to know that this size is not necessarily the amount we specify when we apply for memory, in general it will be slightly larger than the size specified. It is clear from the structure that the cache has a list array of 0 to N, each unit of the list array is a linked list, each node of the linked list is a usable memory, and all the node memory blocks in the same list are equal in size, but the memory sizes of the different lists are unequal. In other words, a unit of the list array stores a fixed-size block of memory, and the size of the memory blocks stored in different units is unequal. This means that the cache caches a memory object of a non-homogeneous size and, of course, allocates what kind of memory block when the memory size you want to apply is closest to what kind of cache memory block. The 0 to n subscript of the list array is a different sizeclass,n is a fixed value equal to 60, so the cache can provide 60 classes (0<><>< p=""><><>
Runtime Mcache_alloc the process of allocating memory is to take a list of memory blocks from the lists array according to the parameter sizeclass, and if the list is not empty, return the first node directly, or if the list is empty, the cache memory does not meet this size. This time call runtime·MCentral_AllocList from central to get * * A batch of such size of memory block, then the first node back to use, the rest of the memory block hanging on the list, for the next allocation cache.
The memory allocation logic on the cache is simple, that is, the cache does not get to the central to fetch. In addition to the allocation of memory, there are many state counters on the cache, mainly to count the allocation of memory, such as: How much memory allocated, how much memory is cached, and so on. These status counters are very important and can be used to monitor the memory management and profile of our program, and the data of classes in the runtime package MemStats is from these underlying counters.
The cache has two main release conditions, one is that when a memory block list is too long (>=256), it intercepts a subset of nodes from this list, returns it to central, and the cache memory is too large (>=1m). Also return each of the linked lists to the central part of the node.
The cache layer is not locked or locked for memory allocation and release operations, because the cache is exclusive to each thread. Therefore, the main purpose of the cache layer is to increase the frequency of small memory allocation release speed.
Central
The MHEAP structure in the malloc.h source file defines central as follows:
struct MHeap{。。。struct {MCentral;byte pad[CacheLineSize];} central[NumSizeClasses];。。。}
As you can see in the heap structure, a 0 to n array is used to store a batch of central, not just one central object. From the above structure definition you can know this array length bit 61 elements, that is, the heap is actually maintained 61 central, which corresponds to the cache of the list array, that is, each sizeclass has a middle. Therefore, when memory is requested in the cache, if no free memory is found on a sizeclass's memory-linked list, the cache will fetch a block of memory to the corresponding Sizeclass Central. Note that the definition of the central array here uses padding bytes, which is because multithreading will concurrently access different central to avoid false sharing.
Central implementation is mainly in the MCENTRAL.C source file, the core structure mcentral defined in malloc.h, the structure is as follows:
struct MCentral{Lock;int32 sizeclass;MSpan nonempty;MSpan empty;int32 nfree;};
The nonempty and empty fields in the mcentral structure are more important, and the emphasis is on explaining the two fields. Both of these fields are mspan types, and it is a bold guess that the two fields will hang a two-way linked list constructed by a span node (Mspan in the previous article), except that the head node of this doubly linked list is not used. The nonempty literal meaning is non-null, indicating that the span nodes stored on the list are non-null, meaning that the span nodes have free memory, empty and that the spans stored by this list are empty, and none of them have free available memory. In fact, the empty list at the beginning is null, when a span node on nonempty is exhausted, the span is moved to the empty list.
We know that when the cache is not in use, it will runtime·MCentral_AllocList get a chunk of memory from central, and the central cache has only one upstream user to look at the simplified logic of this function:
Int32runtime Mcentral_alloclist (Mcentral *c, Int32 n, MLink **pfirst) {runtime lock (c);//First part: Judging if nonempty is empty, if empty, You need to get the span from the heap to populate the nonempty list. Replenish Central list if Empty.if (runtime Mspanlist_isempty (&c->nonempty)) {if (! Mcentral_grow (c)) {... }}//the second part: Take a span node from the nonempty list and get enough memory blocks from the freelist of the span. When this span is running out of memory, you get as much as you can. s = C->nonempty.next;cap = (s->npages << pageshift)/s->elemsize;avail = cap-s->ref;if (Avail < n) n
= avail;//First one is guaranteed to work, because we just grew the List.first = S->freelist;last = First;for (i=1; i
Next;} S->freelist = Last->next;last->next = nil;s->ref + = n;c->nfree-= n;//Part III: If the above span memory is exhausted, Move it to the empty list. if (n = = avail) {Runtime Mspanlist_remove (s); runtime Mspanlist_insert (&c->empty, s);} runtime unlock (c);//Part IV: The last memory block chain to be obtained is returned through the parameter Pfirst.
*pfirst = First;return N;}
The process of getting a chunk of memory from central to the cache does not look very complicated, except that the process needs to be locked. The focus here is to focus on the first part of populating the nonempty, where Central has no free memory and needs to apply to the heap. The function is called here MCentral_Grow . The main work of the Mcentral_grow function is to first call runtime The Mheap_alloc function applies a span to the heap, then splits the continuous page in span into a small block of memory for the central corresponding sizeclass, and hangs these small memory strings on the freelist of the span Finally, the span is placed in the nonempty linked list. Central in the absence of free memory, to the heap as long as a span, not multiple; The requested span contains how many page is determined by the sizeclass corresponding to the central.
The memory allocation process in Central is done, take a look at the approximate release process. When the cache layer frees up memory, it returns a batch of small chunks of memory back to the central,central when they receive the returned memory blocks, and each memory block is returned to the corresponding span. After the memory block is returned to span, if span was previously run out of memory and is in the empty list, it needs to be moved to nonempty, indicating that there is memory available. After returning a small chunk of memory to span, if all the page memory in span is collected, that is, no memory is used, the span is returned to the heap as a whole.
At the central level, the granularity of memory management is basically span, so span is a very important tool component.
Heap
Finally came to the largest heap, which is the furthest layer away from the Go program, the most recent layer from the operating system, this layer is mainly from the operating system to apply for memory to central and so on. The core structure of the heap mheap defined in the malloc.h, be sure to look closely. Whether the memory is fetched from the heap through central, or if the large memory condition skips the cache and central directly to the heap for memory, it calls the following function to request memory.
MSpan* runtime·MHeap_Alloc(MHeap *h, uintptr npage, int32 sizeclass, int32 acct, int32 zeroed)
The function prototype shows that the heap is going to be memory, not in bytes, but how many page. Parameter npage is the number of page required, Sizeclass equals 0, is bypassing the cache and central to the direct heap of large memory allocation; When Central calls this function, the SIZECALSS must be a value of 1 to 60. All page requests from the heap must be contiguous and managed through span, so the return value is a span, not a page array, and so on.
The actual memory allocation logic in the heap is a function that is located in Mheap.c MHeap_AllocLocked . Before parsing the logic of this function, take a look at the free and large two fields of the heap in the structure diagram. freeis an array of 256 units, each of which is stored as a span linked list, but the number of different cell spans with page is different, the number of spans containing page equals the subscript of the unit of this span, for example: Free[5] The span in the list contains 5 page. If the page number of a span exceeds 255, the span is placed large in the linked list.
For memory from the heap, the first is to get the most appropriate span based on the requested page count to free or large. Of course, if you can't find a suitable span in the large list, you can only call the Mheap_grow function to request memory from the operating system, populate the heap, and then try to allocate it. When you get a span that contains a page number that is greater than the number of page requests, the entire span is not returned for use, but the span is split, split into two spans, and the remaining span is re-placed back into the free or large linked list. Because the heap is always a page, and if it is all returned to use, it may be too wasteful, so it is necessary to return only the requested page number.
The span requested from the heap, when a memory release is returned to the heap, is mostly placed in the free or large linked list.
Heap than arrogance complex is not allocated to release memory, but need to maintain a lot of metadata, such as the map has not been described in the mapping domain, this map is maintained by the page to span map, that is, any piece of the existence of the page, you can know the memory belongs to which span, This will make the correct memory recovery possible. In addition to map, there are structures such as bitmap, which are used to mark memory for GC services.
Later, when you analyze the garbage collector (GC), you'll look back at the heap. This article is enough, but there are a lot of details that are not covered, such as how the heap is taking memory from the operating system, a 2-minute goroutine in the heap, and so on.
It is strongly recommended to familiarize yourself with the C language and see the source code in person, there are too many interesting details.
Note: This article is based on the Go1.1.2 version code.
** in the C language world, memory management is the most headache and the coolest thing.