A detailed description of the Linux heap management strategy

Source: Internet
Author: User
Tags prev

Recently read the Linux heap management article, this blog post is to refine and summarize the article.

Getting started binary is hard!

Linux Heap management strategy

1. General statement

When malloc is called in the main thread, it is discovered that the system allocates a heap to the program and is exactly above the data segment. This means that it is implemented through the BRK system call. and the allocated address space is much larger than the size of the request, we call it the main arena (each arena contains multiple chunk, which are organized in the form of a linked list). Since the requested address space is much larger than the address space you need, the main thread in the subsequent application of the heap space, will be applied from the remainder of the 132KB, until the end of use or insufficient, and then by increasing the way the program break location to increase the main The size of the arena. Similarly, when there is too much free memory in main arena, the arena is reduced by reducing the program's break location.

called in the main thread After the free function, the heap space of the program in memory is not freed but managed by the malloc library function of glibc. It adds the freed chunk to the bin of main arenas (the structure of the record idle list is called bins, and then every time the user calls the malloc request heap space, glibc malloc tries to find a bins that meets the requirements from the chunk first. If no new heap space is requested for the operating system).

when a child thread requests a heap space, the operating system does not allocate by BRK system calls but is allocated through mmap system calls, and the other is the same as the main thread allocation release.

2, Arena Introduction

2.1. Quantity limit

Although both the main thread and the child thread have their own independent arena, the number of arena is fixed, which is related to the number of processor cores in the system.

32bit:y=2*x+1

64bit:y=8*x+1

where y is the number of arena and X is the number of processor cores

2.2, the management of Arena

for Shared Arena, here is an example of a single-core 32bit processor. When the primary thread calls malloc for the first time, glibc malloc assigns it a main arena directly. GLIBC malloc creates a new arena for each user thread when it first calls malloc 1 and child thread 2. However, when the third child thread requests malloc, glibc malloc loops through all the arena, when a heap block that can be lock is found (the current child thread is not using heap memory to indicate that it can lock), and returns it to the requested child process as shared arena. However, if no arena is available, the child threads that request heap memory will be blocked until a arena is available. Other complex situations can be followed and so on.

3. Heap Management

3.1. General statement

(1), Heap_info:

that Heap header, because a thread heap (except the main heap) can contain multiple heaps, each heap is assigned a heap header for efficient management of the heap. (When the heap is not enough, we know from the previous introduction that the channel heap will be applied through the MMAP system call, then the new heap will be added to the current thread heap, so a thread heap with multiple heap is present)

typedef struct_heap_info{

Mstate ar_ptr;//belongs to which Arena

Struct_heap_info *prev;//before a heap_info

size_t size;//Current byte size

size_t Mprotect_size;

Char pad[-6*size_sz&malloc_align_mask];//byte alignment

}heap_info;

(2), Malloc_state:

that Arena header, each thread contains only one Arena header. The Arena header contains bins information, top Chunck, and the last remainder chunk.

struct malloc_state{

mutex_t Mutex;

int flags;

Mfastbinptr fastbinsy[nfastbins];//Quick Table

Mchunkptr top;

Mchunkptr Last_remainder;

Mchunkptr Bins[nbins*2-2];

unsigned int binmap[binmapsize];

struct Malloc_state *next;

struct Malloc_state *next_free;

internal_size_t System_mem;

internal_size_t Max_system_mem;

};

(3), Malloc_chunk

that Chunk Header, a heap can be divided into multiple Chunk, and the size of each Chunk is called by the user at malloc when the parameter size "determines".

struct malloc_chunk{

internal_size_t prev_size;//the size of the previous idle chunk

internal_size_t size of Size;//chunk

struct Malloc_chunk *fd;//doubly linked list, only idle chunk exist

struct Malloc_chunk *bk;

};

3.2. The relationship between heap segment and arena

thread Arena contains only one malloc_state, the arena header, but there are multiple heap_info that are the heap headers. And if the two heap is allocated through mmap, then they are not contiguous in memory but belong to different memory ranges, and for ease of management, libc malloc points the prev member of the latter heap_info struct to the starting position of the previous heap_info struct, ar_ PTR member, and the first Heap_info AR_PTR member points to malloc_state, thus constituting a single linked list for easy administration.

4. Further discussion on chunk

Chunk, however, is the smallest operating unit in a heap of memory management. Chunk is divided into 4 categories: 1) allocated chunk, 2) free chunk, 3) top chunk, 4) last remainder chunk. For ease of discussion, it is divided into allocated chunk and free chunk.

4.1. Implicit link List Technology

because heap memory is managed in Chunk, the boundaries of chunk need to be clearly defined and tagged to allocate blocks and free blocks. For assigned chunk, there is a pointer to payload (the pointer returned at malloc points to the beginning of the payload). Chunk size as the head of the Chunk, because the 8-byte justification shows that the latter three bits of the Chunk size are invalid, so they are used as the CHUNK flag bit. Where the 0bit bit is used to mark whether the chunk is assigned (1 means allocated and 0 is idle). The padding section is used for address alignment, and the entire chunk must be an integer multiple of 8.

Cons: Inefficient, when memory recovery is difficult to merge adjacent multiple free chunk, which will result in a large number of small chunk can not be allocated, eventually the entire memory will be exhausted.

4.2. Merge technique with Border mark

in order to solve the chunk of the previous adjacent chunk idle and merge these two chunk need to traverse the entire chunk, and means that the time to release chunk consumption is also linearly related to the size of the heap. Therefore, a solution-boundary marker is proposed.

Add a copy of the head after each chunk padding, which we call the foot (Footer). Since the foot is in the first 4 bytes of the next chunk, it is easy to locate the previous block and merge it.

Disadvantage: Because the addition of feet makes the size of the chunk larger, and for the application of small chunk operation will cause a lot of performance loss. At the same time, we only need to merge for the idle chunk, so we don't need the foot for the allocated chunk.

Optimization: Stores the previous chunk whether the assigned flag bit is on the current chunk 1 or 2bit bit. So we can judge whether the previous chunk is allocated according to the bit bits of the current chunk, thus judging whether the first four bytes of the current chunk header are the feet of the previous chunk

4.3, support multi-threaded chunk

in order to be able to mark whether the current chunk is thread arena, the way to determine the application should be mmap or BRK implementation. However, because the current chunk has only one extra bit left, the chunk needs to be changed greatly.

since the head saves the chunk whether or not the previous chunk is assigned, we do not think it is necessary, and whether the current chunk is allocated can be implemented according to the next chunk flag bit. So we're going to use the remaining bit bits as flag bits that mark multithreading.

which n Indicates whether the current chunk is a thread arena,m indicates whether the current chunk is generated through a mmap system call, and P indicates whether the previous chunk was assigned.

optimization: It is found that it is not necessary to save a copy of the chunk size, but the free block merge must also know the size of the previous chunk. So we move the foot of the chunk to the front of the header and not save the size of the current chunk, but the size of the previous chunk. And only the current one chunk is idle when this foot is useful, the assigned chunk will be regarded as part of the payload or padding.

In short,the prev_size and size in the Malloc_chunk form the implicit list, and the subsequent pointers to FD,BK are not implicitly linked lists, but are used as display-list bins to speed up memory allocation and release efficiency, only in the free exist in chunk)

5.Top Chunk

when a chunk is at the top of a arena (the highest memory address), it is called the top chunk. The chunk does not belong to any bin, but is assigned to the user when all free chunk in the system are not able to meet the memory size of the user's request, regardless of which bin is available. If the size of the top chunk is larger than the size of the user request, the top chunk as two parts: 1) The remainder of the user request chunk;2) becomes the new top chunk. Otherwise, you will need to expand or apply for a new heap (in main arena with sbrk extension, in thread arena through mmap allocation).

6. Lastremainder Chunk

when the user requests a small bin and cannot be satisfied by the small bin, unsorted bin, the Binmaps traverses the bin to find Chunk, If the chunk has the remainder, it turns the remainder into a new chunk and joins the unsorted bin and becomes the last remainder chunk.

its appearance is mainly to improve the efficiency of continuous malloc (small chunk) operation.

7,bin Introduction

Bin is a linked list structure that records free chunk and divides it into 4 categories according to its size: Fast bin, unsorted bin, Small bin, Larger bin. There are two types of data structures used to record bin: Fastbiny: This is an array that records all fast bins;bins: This is also an array that records all bins except fast bins. In fact, a total of 126 bins, namely: Bin 1 for unsorted bin;bin 2-63 for small bin;bin 64-126 for large bin.

The specific data structure is defined as follows:

Struct malloc_state{

/*fastbins*/

Mfastbinptr Fastbinsy[nfastbins];

......

/*normal bins packed as described above*/

Mchunkptr bins[nbins*2-2];//#define NBINS 128

......

};

where mfastbinptr:typedef struct malloc_chunk *mfastbinptr;mchunkptr:typedef struct malloc_chunk *mchunkptr.

8.Fast bin

Chunk a size of 16 to 80 bytes of Chunk is called Fast Chunk. Chunk size represents the actual overall size of the malloc_chunk. The chunk unused size that is used below represents the size that is actually available in the Malloc_chunk after the auxiliary members such as PREV_SIZE,SIZE,FD,BK are shaving. Therefore, for free chunk, its actual usable size is always 16 bytes less than the actual overall size. Fast Bin is the fastest in all bin operations.

1) Fast bin has a fixed number of 10.

2) Each fast bin is a single-linked list (only the FD pointer). In fast bin, either adding or removing fast chunk is an operation on the end of the chain and does not operate on an intermediate fast chunk. That is, add free memory to the end of the chain, delete (malloc memory) is the tail of the linked list of fast chunk deleted. Therefore, each fastbin element in the Fastbinsy array points to the tail node of the linked list, and the tail node's FD points to the previous node.

3) The fast chunk size contained in 10 fast bins is sorted by step 8 bytes, that is, all fast chunk size in the first fast bin is 16 bytes, the second fast bin is 24 bytes, and so on. At malloc, the largest fast chunk size is set to 80 bytes (chunk unused size is 64 bytes), so chunk by default for 16-80 bytes is classified to fast chunk.

4) Chunk in Fast bin does not perform a merge operation, and the allocation flag bit for its chunk is always set to 1 (allocated state).

5) When the size of the user through malloc request belongs to fast chunk (the size of the user request + 16 bytes is the actual chunk size). The maximum memory size supported by fast bin at initialization and all fast bin linked lists are empty, so the initial request for memory is not processed by fast bin.

when the first call to malloc (fast bin), the system executes the _INT_MALLOC function, the function will first find that the current fast bin is empty, will be forwarded to the small bin processing, found that small bin is also empty, The Malloc_consolidate function is called to initialize the malloc_state struct. The malloc_state structure functions as follows:

First, determine whether the fast bin in the current malloc_state structure is empty, and if NULL indicates that the entire malloc_ is uninitialized, initialize. Call the Malloc_init_state (AV) function, which initially chemical control all bins other than fast bin, and then initializes fast bins. Fast bins can then be used when the malloc (fast chunk) function is executed again. Conversely, when a free (fast chunk) operation is performed, the size of the chunk corresponding to the pointer is obtained by the Chunksize function based on the incoming address pointer, and then the fast bin to which chunk belongs is obtained based on the chunk size. You can then add this chunk to the end of the fast bin chain. The entire operation is done in the _int_free function.

9.Unsorted bin

when smaller or larger Chunk are released, the system adds these chunk to the unsorted bin if they are not added to the corresponding bins. This is a second chance for the management mechanism to re-use the recently released chunk (the first time for the fast bin mechanism). The unsorted bin is a circular doubly linked list made up of free chunks. In the unsorted bin, there is no limit to the size of chunk, and any size chunk can be attributed to unsorted bin.

10.Small Bin

A chunk of less than 512 bytes called small Chunk,small bin is used to manage small chunk. In terms of memory allocation and release speed, the small bin is faster than the larger bin, but slower than fast bin. Small bin features are as follows:

1) There are 62 small bin. Each is a circular double-linked list that is made up of the corresponding free chunk. And when memory is freed, the newly freed chunk is added to the front end of the list, and chunk is obtained from the end of the list when allocating memory.

2) The first chunk size is 16 bytes, and the size of each subsequent small bin chunk is incremented by 8 bytes, that is, the chunk of the last small bin is 16+62*8=512 bytes.

3) Adjacent free blocks require a merge operation.

4) The malloc operation is similar to fast bins and is initially empty and left to the unsorted bin for processing. cannot be processed and then traversed backwards until the top chunk is expanded and must be processed.

5) When releasing the small chunk operation, first check if the chunk adjacent chunk is free, merge and remove from small bin and add to unsorted bin.

11.Large Bin

A chunk greater than 512 bytes is called larger chunk. The Large bin is used to manage these Large chunk.

1) Large chunk number 63. And the chunk size in large chunk can be different, but must be in a given range. Large Chunk can be added to remove any one location in Large Chunk. In these 63 large bins, the first 32 large bins are spaced in 64-byte increments, that is, the first large bin chunk size is 512~575 bytes, and the second large bin chunk size is 576 ~ 639 bytes. The next 16 large bin followed by a 512-byte step interval, followed by 8 bins in step 4096, followed by 4 bins in 32768-byte intervals, followed by 2 bins in 262144-byte intervals The rest of the chunk is placed in the last large bin. Since the size of each chunk in the same large bin is not necessarily the same, in order to speed up memory allocation and release, all chunk in the same large bin are arranged from large to small according to chunk size: The largest chunk is placed in the list of front End, the smallest chunk is placed at rear end.

2) merge operation similar to small bin

3) The initialization operation is similar to the small bin. After initialization is complete, first determine which large bin the size of the user request belongs to, and then determine if the largest chunk in large bin is larger than the size of the user request (by the size of front end in the linked list), If it is greater than the end of the chain, traverse the first size of a similar or close chunk assigned to the user (if the chunk is too large to split back to unsorted bin). Because of the number of bins, and chunk in different bins are likely to be in different memory pages, if traversing chunk in each bin can occur multiple memory page interrupt operation, so glibc designed BINMAP structure to help improve the speed of bin-by-bin retrieval. Binmap Records whether the bins are empty, and bitmap can avoid retrieving some empty bins.

4) The release operation is similar to the small chunk.

A detailed description of the Linux heap management strategy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.