In-depth analysis of Linux heap memory management (I) and in-depth analysis of Memory Management

Source: Internet
Author: User

In-depth analysis of Linux heap memory management (I) and in-depth analysis of Memory Management

LinuxIn-depth heap memory management analysis
(Upper Half)

Author: Location @ Alibaba Cloud universal security

  

0 Preface

In recent years, vulnerability mining has become increasingly popular, and various analysis articles on vulnerability mining and exploitation have emerged. In general, there are two types of vulnerabilities: Stack Overflow and stack overflow. There are a lot of information about stack overflow in China, so I will not talk about it here, but there is very little information about the exploitation of stack overflow vulnerabilities. In my opinion, the threshold for heap overflow is high. I need to thoroughly understand the heap memory management mechanism of the corresponding operating system. This part of content has always been a challenge. Therefore, this series of articles mainly focus on the heap memory management mechanism of Linux systems, this section describes common Heap Overflow Vulnerability exploitation technologies such as basic heap overflow, unlink-based heap overflow, double free, and use-after-free.

 

I accidentally learned this article some time ago:

Https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/comment-page-1/

 

This article is one of the best articles I have read recently. It is easy to understand and has clear regulations. As a beginner, I have learned a lot about linux heap memory management. However, it is estimated that Due to space limitations, this article is too difficult to understand some knowledge points. Therefore, I decided to write a detailed and complete linux heap management Introduction Article based on other references and my own understanding, we hope that we can provide the ultimate power to other beginners. Therefore, for content sources, this article mainly consists of two parts: one is the article mentioned above in the translation; the other is the supplementary explanation added by the author in combination with other reference materials and his own understanding. In view of my lack of knowledge and ability, if you have any questions, thank you!

 

Similarly, due to the long length, I divided the article into two parts: the previous section mainly introduces some basic concepts and relationships in heap memory management, it also focuses on the implicit linked list Technology Used in heap chunk allocation and release policies. The second part mainly introduces glibc malloc, the display linked list technology introduced to improve the efficiency of heap memory allocation and release, that is, the concept and core principle of binlist. The source code used is as follows:

Https://github.com/sploitfun/lsploits/tree/master/glibc

 

1 heap memory management overview

 

Currently, there are several major heap memory management mechanisms for various platforms:

Dlmalloc-General purpose allocator

Ptmalloc2-glibc

Jemalloc-FreeBSD and Firefox

Tcmalloc-Google

Libumem-Solaris

 

This article describes how to implement ptmalloc2 in linux glibc.

Originally, linux uses dlmalloc by default, but since it does not support multi-thread heap management, it was later replaced by prmalloc2 that supports multithreading.

Of course, in linux, * malloc is implemented by calling brk or mmap through the system. For this part,Be sure to learn the followingThis article:

Https://sploitfun.wordpress.com/2015/02/11/syscalls-used-by-malloc/

 

In view of the length of the article, this article will not detail it, just to facilitate the later understanding of heap memory management, intercept the function call relationship diagram:

Figure 1-1 function call Relationship Diagram

 

System memory distribution chart:

Figure 1-2 system memory distribution

 

2. experiment demonstration

Imagine the following code:

/* Per thread arena example. */#include <stdio.h>#include <stdlib.h>#include <pthread.h>#include <unistd.h>#include <sys/types.h> void* threadFunc(void* arg) {        printf("Before malloc in thread 1\n");        getchar();        char* addr = (char*) malloc(1000);        printf("After malloc and before free in thread 1\n");        getchar();        free(addr);        printf("After free in thread 1\n");        getchar();} int main() {        pthread_t t1;        void* s;        int ret;        char* addr;         printf("Welcome to per thread arena example::%d\n",getpid());        printf("Before malloc in main thread\n");        getchar();        addr = (char*) malloc(1000);        printf("After malloc and before free in main thread\n");        getchar();        free(addr);        printf("After free in main thread\n");        getchar();        ret = pthread_create(&t1, NULL, threadFunc, NULL);        if(ret)        {                printf("Thread creation error\n");                return -1;        }        ret = pthread_join(t1, &s);        if(ret)        {                printf("Thread join error\n");                return -1;        }        return 0;}

Next we will analyze the heap memory distribution in each stage in sequence.

 

1.Before malloc in main thread :

Before the program calls malloc, there is no heap segment in the program process, and there is no thread stack before the thread is created.

 

2.After malloc in main thread :

After calling malloc in the main thread, you will find that the system allocates a stack to the program, and the stack is just above the data segment.

 

This indicates that it is implemented through the brk system call. In addition, we can also see that although we only applied for 1000 bytes of data, but the system allocated a kb heap, why? Originally, the 132KB heap space is called arena. At this time, because it is allocated by the main thread, it is called main arena (each arena contains multiple chunks, which are organized as linked lists ). Because 132KB is much larger than 1000 bytes, if the main thread asks for heap space later, it will apply from the remaining part of the 132KB until it is used up or not enough, then, increase the size of main arena by adding the program break location. Similarly, when the main arena has too much idle memory, it also reduces the size of the main arena by reducing the program break location.

 

3.After free in main thread :

After the main thread calls free: From the memory layout, we can see that the heap space of the program has not been released. The space allocated by calling the free function is not directly "Returned" to the system, instead, it is managed by the malloc library function of glibc. It will add the released chunk to the bin of main arenas (this is a double-chain table data structure used to store the same type of free chunk, which will be described in detail later. Here, the freelist data structure that records idle space is called bins. Then, when you call malloc again to apply for heap space, glibc malloc first tries to find a chunk that meets the requirements from bins. If not, it will apply for a new heap space from the operating system. As shown in:

 

4.Before malloc in thread1 :

Before thread1 calls malloc: the output result shows that there is no heap segment in thread1, but thread1's stack space has been allocated at this time:

 

5.After malloc in thread1 :

After thread1 calls malloc: the output result shows that the heap segment of thread1 has been allocated, and the starting address of this region shows that it is not allocated through brk, instead, it is allocated through mmap because its region is 1 MB in total for the b7500000-b7600000 and is not adjacent to the data segment of the program. At the same time, we can also see that in this 1 MB, The Memory attribute is divided into two parts: 0xb7500000-0xb7520000 a total of KB space is a readable and writable attribute; followed by a non-writable attribute. Originally, only the 132KB space that can be read and written is the heap space of thread1, that is, thread1 arena.

 

6. After thread1 calls free: Same as main thread.

 

3 Arena Introduction 3.1 Arena Quantity limit

In Chapter 2nd, we mentioned that main thread and thread1 have their own arena. So no matter how many threads there are, each thread has its own arena? The answer is no. In fact, the number of arena is related to the number of processor cores in the system, as shown in the following table:

For 32 bit systems:     Number of arena = 2 * number of cores + 1.For 64 bit systems:     Number of arena = 8 * number of cores + 1.

 

3.2MultipleArenaManagement

Assume that a PC with only one processor core is installed with a 32-bit operating system and runs a multi-thread application, there are 4 threads in total-the main thread and three user threads. Obviously, the number of threads is greater than the maximum number of arena that can be maintained by the system (2 * Number of cores + 1 = 3 ), at this time, glibc malloc needs to ensure that the four threads can correctly share the three arena. How does it implement it?

When the main thread calls malloc for the first time, glibc malloc directly allocates a main arena for it without any additional conditions.

When user thread 1 and user thread 2 call malloc for the first time, glibc malloc creates a new thread arena for each user thread. In this case, each thread corresponds to arena one by one. However, a problem occurs when thread 3 calls malloc. Because the number of arena maintained by glibc malloc has reached the upper limit and thread 3 cannot be assigned a new arena, you need to repeat one of the three allocated arena (main arena, arena 1 or arena 2 ). Which arena should be selected for reuse?

 

1) First, glibc malloc cyclically traverses all available arenas. During the traversal process, it tries to lock the arena. If the lock succeeds (the thread corresponding to this arena does not use heap memory, it indicates that it can be locked). For example, if the main arena is successfully locked, the main arena is returned to the user, this means that the arena is shared by thread 3.

2) if the available arena cannot be found, the malloc operation of thread 3 will be blocked until there is an available arena.

3) Now, if thread 3 calls malloc again, glibc malloc tries to use the recently accessed arena (main arena at this time ). If main arena is available at this time, use it directly. Otherwise, thread 3 is blocked until main arena is available again.

In this way, thread 3 shares the main arena with the main thread. For other more complex situations, and so on.

 

4 heap Management 4.1 Overview

Stack Management in glibc malloc mainly involves the following three data structures:

1. heap_info: Heap Header.Thread arena(Note: does not containMain thread)It can contain multiple heaps. Therefore, a heap header is assigned to each heap for ease of management. Under what circumstances will thread arena contain multiple heaps? When the current heap is not enough, malloc will call mmap by the system to apply for a new heap space. The new heap space will be added to the current thread arena for ease of management.

typedef struct _heap_info{  mstate ar_ptr; /* Arena for this heap. */  struct _heap_info *prev; /* Previous heap. */  size_t size;   /* Current size in bytes. */  size_t mprotect_size; /* Size in bytes that has been mprotected                           PROT_READ|PROT_WRITE.  */  /* Make sure the following data is properly aligned, particularly     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of     MALLOC_ALIGNMENT. */  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];} heap_info;

 

2. malloc_state: Arena Header. Each thread contains only one Arena Header. Arena Header contains information about bins, top chunk, And the last remainder chunk (these concepts will be described in detail later ):

struct malloc_state{  /* Serialize access.  */  mutex_t mutex;   /* Flags (formerly in max_fast).  */  int flags;   /* Fastbins */  mfastbinptr fastbinsY[NFASTBINS];   /* Base of the topmost chunk -- not otherwise kept in a bin */  mchunkptr top;   /* The remainder from the most recent split of a small request */  mchunkptr last_remainder;   /* Normal bins packed as described above */  mchunkptr bins[NBINS * 2 - 2];   /* Bitmap of bins */  unsigned int binmap[BINMAPSIZE];   /* Linked list */  struct malloc_state *next;   /* Linked list for free arenas.  */  struct malloc_state *next_free;   /* Memory allocated from the system in this arena.  */  INTERNAL_SIZE_T system_mem;  INTERNAL_SIZE_T max_system_mem;};

 

3. malloc_chunk: The Chunk Header. A heap is divided into multiple chunks. The size of each chunk is determined by the user's request, that is, the user calls malloc (size) the passed size parameter "is" the chunk size (Here we add quotation marks "is" to indicate that this representation is not accurate, but it is described for convenience, for more information, see the following ). Each chunk is represented by a structure malloc_chunk:

Struct malloc_chunk {/* # define INTERNAL_SIZE_T size_t */INTERNAL_SIZE_T prev_size;/* Size of previous chunk (if free ). */INTERNAL_SIZE_T size;/* Size in bytes, including overhead. */struct malloc_chunk * fd;/* double links -- used only if free. the two pointers Only exist in the free chunk */struct malloc_chunk * bk;/* Only used for large blocks: pointer to next larger size. */struct malloc_chunk * fd_nextsize;/* double links -- used only if free. */struct malloc_chunk * bk_nextsize ;};

 

Many readers may wonder: there is no data-like field in the struct to indicate the heap memory space applied by the user? This struct explicitly contains two members of the size_t type and four pointers. Doesn't that mean that the size of malloc_chunk is fixed? Then how can it allocate different memory sizes based on user requests? To answer this question clearly, we need to fully understand the heap memory management mechanism of glibc malloc. At the same time, one of the main purposes of this article is to explain this concept clearly, I will introduce it in detail in Chapter 5th.

 

NOTE:

1. Main threadDoes not include multipleHeapsSo it does not includeHeap_infoStruct. When more heap space is neededSbrkOfHeap segmentTo obtain more space until it hits the memoryMappingRegion.

2. Different fromThread arena,Main arenaOfArena headerNotSbrk heap segmentIs a global variable! Therefore, it belongsLibc. soOfData segment.

 

4.2 heap segment And Arena Link

First, sort out the organizational relationship between malloc_state and heap_info through the memory distribution chart.

The memory distribution of the main arena and thread arena with only one heap segment:

Figure 4-1 contains only the main arena and thread arena of a heap segment.

 

When a thread arena contains multiple heap segments:

Figure 4-2 memory distribution chart of a thread arena Containing Multiple heap segments

 

We can see that thread arena only contains one malloc_state (arena header), but has two heap_info (heap header ). Because the two heap segments are memory allocated through mmap, they are not adjacent in the memory layout but are divided into different memory intervals, so in order to facilitate management, libc malloc points the prev Member of the second heap_info struct to the starting position (that is, the ar_ptr member) of the First heap_info struct, And the ar_ptr member of the first heap_info struct points to the malloc_state, in this way, a single-chain table is created to facilitate subsequent management.

 

5. Understanding of chunk

In glibc malloc, the whole heap memory space is divided into consecutive chunks of different sizes, that is, for heap memory management, chunk is the minimum operating unit. Chunks are divided into four categories: 1) allocated Chunk; 2) free chunk; 3) top chunk; 4) Last remainder chunk. Essentially, all types of chunks are contiguous areas in the memory.Some identifiers at specific locations. For ease of use, we first simplify these four types of chunks into two categories: allocated chunk and free chunk. The former indicates the chunk that has been allocated to the user, and the latter indicates the unused chunk.

 

As we all know, no matter what kind of heap memory manager, its core goal is to efficiently allocate and recycle memory blocks (chunk ). Therefore, it needs to design relevant algorithms and the corresponding data structure, and the data structure is often changed according to the needs of the algorithm. Since it is an algorithm, the algorithm must have an optimization and improvement process. Therefore, based on the evolution of heap memory manager, this section describes how to design the chunk data structure in glibc malloc and its advantages and disadvantages.

 

PS: in view of limited time and energy, the evolution history introduced later has not been strictly verified, I just follow some reference books, my own understanding, and the content arrangement of the article to make a "kindly fabricated". If you have any mistakes, please be sure to make an axe!

 

5.1 Implicit linked list Technology

As mentioned above, any Heap Memory Manager manages heap memory in units of chunk, which requires some data structures to mark the boundary of each block, and distinguish between allocated blocks and idle blocks. Most heap memory managers embed the boundary information into the chunk as part of the chunk. The typical design is as follows:

 

Figure 5-1 simple allocated chunk format

Figure 5-2 simple free chunk format

 

In the heap memory, each chunk must be an integer multiple of 8. Therefore, the last three digits of the chunk size are invalid. to make full use of the memory, the heap Manager uses these three bits as the chunk flag. Typically, The 0th bits are used to mark whether the chunk has been allocated. This design is clever, because we only need to obtain a pointer pointing to the chunk size to know the chunk size, that is, to determine the chunk boundary, the 0th-bit chunk size can also be used to determine whether the chunk has been allocated, so that the chunk can be distinguished. Note that in allocated chunk, the padding part is mainly used for address alignment (it can also be used to deal with external fragments), that is, the size of the entire chunk is an integer multiple of 8.

 

With the above design, we can organize the entire heap memory into a continuous chunk sequence that has been allocated or not allocated:

Figure 5-3 simple chunk Sequence

 

The above structure is called implicit linked list. The linked list is implicitly linked by the size field of each chunk. During the allocation operation, the heap memory manager can analyze the size field of each chunk by traversing the chunk of the entire heap memory, then find the appropriate chunk.

 

Careful readers may find that the efficiency of this implicit linked list is quite low, especially in memory collection, it is difficult to merge multiple adjacent free chunks. We know that if we only split the free chunk without merging it, it will produce a large number of small internal fragments that cannot be used until the entire memory is exhausted. Therefore, the heap memory manager designs the chunk merge technology with boundary labels.

 

1Merge technology with boundary mark

Imagine the following scenario: assume that the chunk we want to release is P, the first chunk next to it is FD, the next chunk is BK, and both BK and FD are free chunk. It is easy to merge P and BK together, because the size field of P can be used to easily locate the start position of BK and then obtain the size of BK, however, it is difficult to merge P to FD. We must traverse the entire heap from the beginning, find FD, and merge it, this means that the time consumed by each chunk release operation is linearly related to the heap size. To solve this problem, Knuth proposed a smart and universal technology-boundary mark.

 

Knuth adds a Footer at the end of each chunk, which is a copy of the chunk header, which is called the boundary mark:

Figure 5-4 Knuth Boundary Tag in the chunk format of Ultimate Edition

 

Obviously, each chunk's foot is in the first four bytes of its adjacent next chunk's header. Through this step, the heap memory manager can easily obtain the start position and allocation status of the previous chunk and then merge it.

 

However, the boundary mark also brings about a problem: it requires each block to contain a header and foot, if the application frequently applies for and releases small memory (such as 1 or 2 bytes), it will cause high performance loss. At the same time, considering that the foot is required only when the free chunk is merged, it is not required for allocated chunk, therefore, we can optimize this foot-store the allocated/idle tag bits of the previous chunk in 1st or 2 bits of the size field of the current chunk, in this way, if we know that the previous chunk is free chunk through the size field of the current chunk, we can conclude that the four bytes before the current chunk address are the feet of the previous free chunk, we can use this foot to obtain the starting position of the previous chunk. If the flag of the size field of the current chunk indicates that the previous chunk is allocated chunk, we can draw another conclusion: the previous chunk has no feet, that is, the four bytes before the current chunk address are the last part of the payload or padding of the previous allocated chunk. The new chunk format is shown as follows:

Figure 5-5 Protocol version of Knuth boundary mark allocated chunk format

 

Figure 5-6 release edition Knuth boundary mark free chunk format

 

2Re-Evolution--Multithreading supported

With the development of technology, especially the multi-thread support for the heap memory manager, the aforementioned chunk format is hard to meet the requirements, for example, we need a flag to mark whether the current chunk belongs to a non-main thread, that is, thread arena, and whether the chunk is implemented by mmap or through brk. However, the chunk size has only one bit left unused. What should I do? This requires a major chunk format operation!

 

First, consider whether it is necessary to save the current chunk andPreviousChunkAllocated/idle tag bit? The answer is no, because we only need to save the allocation flag of the previous chunk.CurrentChunkCan be queried throughNextChunk. Then the remaining two bits in the size field can be used to meet the mark requirements of multithreading:

Figure 5-7 multi-threaded version Knuth boundary mark allocated chunk format

Figure 5-8 multi-threaded version Knuth boundary mark free chunk format

 

The meanings of P, M, and N are as follows:

PREV_INUSE (P): indicates whether the previous chunk is allocated.

IS_MMAPPED (M): indicates whether the current chunk is generated by calling the mmap system.

NON_MAIN_ARENA (N): indicates whether the current chunk is thread arena.

 

Further, it is found that there is no need to save the chunk size copy, that is, Footer is not very useful, but if the previous chunk is free, when merging, we need to know the size of the previous chunk. What should we do? Move Footer from the end to the header, and it does not save the size of the current chunk, but the size of the previous free chunk. Similarly, to improve memory utilization, if the previous chunk is allocated chunk, this Footer serves as part of the payload or padding of allocated chunk. The structure is as follows:

 

Figure 5-9 current glibc malloc allocated chunk format

 

Figure 5-10 current glibc malloc free chunk format

 

Now, the implicit linked list technology used in the glibc malloc heap memory manager has been introduced. Now let's look back at the malloc_chunk struct. It's easy to understand: This struct forms an implicit linked list through the prev_size and size of each chunk, and subsequent fd, bk and other pointers do not apply to implicit linked lists. Instead, they are used to display the linked list bin for accelerating memory allocation and release efficiency. (Do you still remember bin? Used to record the linked list of the same type of free chunk), and these pointers only exist in the free chunk just like the prev_size. The principle of displaying the bin of the linked list is complicated. We will skip this part of information for the time being, so we will introduce it in detail after introducing all the chunks.

 

5.2 Top Chunk

When a chunk is at the top of a arena (that is, the highest memory address), it is called top chunk. The chunk andDoes not belong to anyBinBut when all the current free chunks (regardless of the bin) in the system cannot meet the memory size of user requests, the chunk is used as an emergency fireman and allocated to users. If the size of the top chunk is larger than the size of the user request, the top chunk is divided into two parts: 1) the chunk requested by the user; 2) the remaining part becomes the new top chunk. Otherwise, you need to expand the heap or allocate a new heap. In the main arena, the heap is extended through sbrk, and the new heap is allocated through mmap in the thread arena.

 

5.3 Last Remainder Chunk

To understand this chunk, you must first understand the bin mechanism in glibc malloc. If you have read the second part of the article, the following principles will be well understood. Otherwise, you are advised to read the second part of the article first. For Last remainder chunk, we have two main questions: 1) how it is generated; 2) What is its role?

 

Answer the first question first. Do you still remember the introduction to the malloc mechanism of small bin in the second part? When a user requests a small chunk and the request cannot be met by small bin and unsorted bin, the user traverses the bin through binmaps to find the most suitable chunk, if the chunk has the remaining part, convert the remaining part into a new chunk and add it to the unsorted bin. In addition,Then the newChunkNewLast remainder chunk.

 

Then, answer the second question. This type of chunk is used to improve the efficiency of continuous malloc (small chunk), mainly to improve the locality of memory allocation. So how can we improve the locality? Examples. When a user requests an small chunk and the request cannot be satisfied by the small bin, the request is sent to the unsorted bin for processing. At the same time, if there is only one chunk in the unsorted bin, that is, the last remainder chunk, the chunk is divided into two parts: the former is allocated to the user, and the rest is placed in the unsorted bin, and becomes the new last remainder chunk. In this way, each small chunk in the continuous malloc (small chunk) is adjacent to the memory distribution, which improves the locality of memory allocation.

 

Author: Go @ aliyunju security. For more technical articles, click the aliyunju security blog.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.