Linux c development-Memory Manager ptmalloc
Memory Layout
To understand the ptmalloc Memory Manager, you must first understand the memory layout of the operating system. The following figure shows the Memory Distribution of stacks and stacks.
X86 LINUX Process Memory layout:
Is the memory layout of the linux operating system. The memory distribution of each module in the operating system is displayed from low to high.
Test Segment: stores program code. It is read-only and determined during compilation.
Data Segment: stores the Data that can be determined when the program is running, which is readable and writable.
BBS Segment: defined and not initialized global and static variables
Heap: Heap. The heap memory address ranges from low to high.
Mmap: ing area.
Stack: Stack. The compiler automatically allocates and releases the package. Memory Address from high to low
Ptmalloc Memory Manager
Ptmalloc is the default Memory Manager of glibc. Our common malloc and free are the basic memory allocation functions provided by the ptmalloc Memory Manager. Ptmalloc is a bit like the memory pool we write. When we apply for and release the memory through the malloc or free function, ptmalloc manages the memory, some policies are used to determine whether to recycle the data to the operating system. The biggest benefit of doing so is to make the user more efficient when applying for memory and releasing memory.
To improve the efficiency of the memory allocation function malloc, ptmalloc will request a piece of memory from the operating system for your use in advance, and ptmalloc will manage the used and idle memory; when the user needs to destroy the free memory, ptmalloc will manage the recycled memory again, and whether to recycle it to the operating system according to the actual situation
Design hypothesis
Ptmalloc compromises design goals such as high efficiency, high space utilization, and high availability. Therefore, we have the following design assumptions:
1. mmap is used for large memory allocation with a long life cycle.
2. mmap is used for memory allocation.
3. Use brk for memory allocation with short lifecycles.
4. Try to cache only the idle small memory blocks temporarily used, and return them to the operating system when releasing large memory blocks or large memory blocks with a long life cycle.
5. The idle small memory blocks will only be merged when malloc and free are used. The free space-time idle memory blocks may be placed in the pool and may not be returned to the operating system.
6. the condition for shrinking the heap is that the current free block size plus the size of the chunk that can be merged before and after the chunk is greater than 64 KB, and the size of the heap top reaches the threshold, return the idle memory at the top of the heap to the operating system.
7. Programs that require long-term storage are not suitable for managing memory with ptmalloc.
8. Non-stop memory allocation ptmalloc will cut and merge the memory, leading to some memory fragments
Primary and non-primary distribution areas
In the memory distributor of ptmalloc, to solve the multi-thread lock contention problem, the master distribution zone main_area and the non-master distribution zone no_main_area.
1. Each process has a primary distribution zone and multiple non-primary distribution zones are allowed.
2. You can use brk and mmap to allocate the primary distribution area, instead of using mmap to map memory blocks.
3. The number of non-primary distribution areas will not decrease as soon as they increase.
4. The primary and non-primary distribution areas form a circular linked list for management.
Basic Unit of chunk memory block
Ptmalloc uses the chunk data structure to organize each memory unit. When we use malloc to allocate a piece of memory, the memory will be recorded to glibc and managed in the form of chunk. You can think of it as a memory data structure when writing a memory pool by yourself.
The chunk structure can be dividedChun in useK andIdle chunk.
The chunk in use is basically the same as the idle chunk data structure, but there will be some design tips, cleverly saving the memory.
Chunk in use:
1. The chunk Pointer Points to the address starting with chunk; the mem Pointer Points to the address starting with the user memory block.
2. When p = 0, it indicates that the previous chunk is idle and prev_size is valid.
3. p = 1 indicates that the previous chunk is in use. prev_size is invalid. p is mainly used for memory block merge operations.
4. the first block allocated by ptmalloc always sets p to 1 to prevent the program from referencing to a nonexistent area.
5. M = 1 is allocated for the mmap ing region; M = 0 is allocated for the heap region.
6. A = 1: non-primary partition allocation; A = 0: Primary partition allocation
Idle chunk
1. Idle chunks will be placed inIdle linked list bins. When the user applies for memory malloc, the user will first find out whether there is suitable memory on the idle linked list bins.
2. fp and bp respectively refer to the chunk on the first and last idle linked lists.
3. fp_nextsize and bp_nextsize indicate the size of the first idle chunk and the last idle chunk respectively. They are mainly used to quickly find the right chunk on the idle linked list.
4. The values of fp, bp, fp_nextsize, and bp_nextsize all exist in the original user region. Therefore, you do not need to prepare a separate memory storage pointer for each chunk.
Idle linked list bins
When the user uses the free function to release the memory, ptmalloc will not be immediately handed back to the Operating System (many times we have executed the free function, but the process memory is not recycled ), instead, it is managed by the idle linked list bins of ptmalloc itself, so that when the next process requires malloc memory, ptmalloc will find a suitable memory block from idle bins and allocate it to users. This can avoid frequent system calls and reduce the memory allocation overhead.
Ptmalloc maintains a total of 128bin. Each bins maintains a chunk of a two-way linked list of similar sizes.
The list of bins shows that when you call malloc, you can quickly find out whether the memory size you want to allocate is in the maintained bin, you can use a two-way linked list to find the appropriate chunk memory block for your use.
1. fast bins. Fast bins is the high-speed buffer zone of bins. There are about 10 fixed-length queues. When you release a chunk (usually small memory) not greater than max_fast (64 by default), it will be placed on fast bins by default. When the user needs to apply for memory next time, he will first go to the fast bins to find out whether there is a suitable chunk, and then go to the idle chunk on the bins. Ptmalloc traverses the fast bin to check whether appropriate chunks need to be merged to bins.
2. unsorted bin. It is a buffer zone of bins. When the memory released by the user is greater than max_fast or the chunk after the merge of fast bins, it will enter the unsorted bin. When malloc is used, the user first goes to unsorted bin to check whether there is a suitable bin. If there is no suitable bin, ptmalloc puts the chunk on unsorted bin into bins, then find the appropriate idle chunk on the bins.
3. small bins and large bins. Small bins and large bins are actually used to place the chunk two-way linked list. Each bin is 8 bytes different from each other, and through the above list, you can quickly locate an idle chunk of the appropriate size. The first 64 values are small bins, with a fixed length. The last 64 values are large bins, which are not fixed length.
4. Top chunk. Not all chunks are put on bins. The top chunk is equivalent to the top idle memory in the allocation area. When the bins cannot meet the memory allocation requirements, the top chunk will be allocated.
5. mmaped chunk.When the allocated memory is very large (greater than the allocation threshold value, the default value is 128 K), mmap ing is required, and the memory will be placed on the mmaped chunk, when the memory on the mmaped chunk is released, it is directly returned to the operating system.
Memory Allocation malloc Process
1. Obtain the locks in the allocation area to prevent multi-thread conflicts.
2. Calculate the actual chunk size of the memory to be allocated.
3. Determine the chunk size. If it is smaller than max_fast (64b), query whether a suitable chunk exists on the fast bins. If so, the allocation ends.
4. Check whether the chunk size is smaller than 512B. If yes, search for chunk from small bins. If yes, the allocation ends.
5. Continue searching from unsorted bins. If there is only one chunk on the unsorted bins and it is greater than the chunk to be allocated, the chunk is cut, and the remaining chunk continues to be thrown back to the unsorted bins, then, it is returned and deleted from unsorted bins. If the size of a chunk in unsorted bins falls within the range of small bins, it is placed in the header of small bins; if the size of a chunk in unsorted bins falls within the range of large bins, locate the appropriate position.
6. search from large bins, find the chain table header, and traverse the chain table in reverse direction until you find the first chunk whose size is greater than the chunk to be allocated, and then perform cutting. If there is more than one, put it in unsorted bin, and the allocation ends.
7. If neither fast bins nor bins find a suitable chunk, You need to perform the top chunk operation to allocate it (the top chunk is equivalent to the remaining memory space in the allocation area ). Determine whether the size of the top chunk meets the chunk size. If yes, separate one chunk from the top chunk.
8. If the top chunk cannot meet the requirements, you need to expand the top chunk. On the primary partition, if the allocated memory is smaller than the allocation threshold (128 kb by default), brk () is directly used to allocate a memory. If the allocated memory is greater than the allocation threshold, mmap is required for allocation; instead of the primary partition, mmap is directly used to allocate a piece of memory. The memory allocated by mmap is placed on the mmap chunk, and the memory on the mmap chunk is directly recycled to the operating system.
Free memory release process
1. Obtain the locks in the allocation area to ensure thread security.
2. If the free pointer is a null pointer, return and do nothing.
3. Check whether the current chunk is the memory mapped to the mmap region. If yes, munmap () will directly release the memory. In the previous data structure that uses chunk, we can see thatMTo identify whether the memory is mmap mapped.
4. Determine whether the chunk is adjacent to the top chunk. If it is adjacent, it is directly merged with the top chunk (adjacent to the top chunk and the idle memory block in the allocation area ). Free ends.
5. If the chunk size is smaller than max_fast (64b), the chunk State is directly put into fast bin. fast bin does not change. Free ends.
6. If the next chunk of the current chunk is idle, merge the two chunks into the unsorted bin.
If the size of the merged chunks is larger than 64 KB, the merge operation of fast bins is triggered. The chunks in fast bins are traversed and merged with adjacent idle chunks, the merged chunk will be placed in the unsorted bin, and the fast bin will become empty. In this process, determine whether the size of the top chunk is greater than the mmap shrinkage threshold (kb by default). If yes, for the primary distribution area, the system tries to return part of the top chunk to the operating system. Free ends.
Mallopt Parameter Optimization
1. M_MXFAST: used to set the maximum chunk size saved in fast bins. The default value is 64B. Up to 80B
2. M_TRIM_THRESHOLD: used to set the mmap shrinkage threshold. The default value is 128KB.
3. M_MMAP_THRESHOLD: M_MMAP_THRESHOLD is used to set the mmap allocation threshold. The default value is 128KB. When the memory to be allocated exceeds the mmap allocation threshold, the malloc () function of ptmalloc is actually equivalent to the simple encapsulation of mmap (), and the free function is equivalent to the simple encapsulation of munmap.
4. M_MMAP_MAX: M_MMAP_MAX is used to set the number of address segments of memory blocks allocated by mmap in the process. The default value is 65536.
5. M_TOP_PAD: this parameter determines the number of idle memories to be retained at the heap top when the libc memory manager calls brk to release the memory. The default value is 0.
Precautions
To avoid the memory spike in Glibc, note the following:
1. The allocated memory is released first,Because ptmalloc compresses the memory from top chunk, if the chunk adjacent to top chunk cannot be released, the chunk below top chunk cannot be released.
2. Ptmalloc is not suitable for managing long-life-cycle memory, especially for occasionally allocating and releasing long-life-cycle memory, which will lead to a sudden increase in ptmalloc memory.
3. multi-threaded phased execution programs are not suitable for ptmalloc. The memory of such programs is more suitable for memory pool management.
4. Minimize the number of threads in the program and avoid frequent memory allocation/release.Frequent allocation will lead to lock competition, leading to the increase of non-primary distribution areas, increased memory fragments, and reduced performance.
5. to prevent memory leakage, ptmalloc is very sensitive to memory leakage. Based on its memory shrinkage mechanism, if the chunk adjacent to the top chunk is not recycled, this will cause the top chunk to fail to return a lot of idle memory to the operating system.
6. Prevent the program from allocating too much memory or the system memory is exhausted due to the sharp increase of Glibc memory. The program is killed by the system due to OOM. Estimate the maximum physical memory size that a program can use. Configure the system's/proc/sys/vm/overcommit_memory,/proc/sys/vm/overcommit_ratio, and use ulimt-v to limit the virtual memory size of the program to prevent the program from being killed due to OOM.