Glibc malloc-more improvements

Source: Internet
Author: User

As mentioned above, glibc implements malloc, which implements heap management in the Linux system. There is no proprietary so-called API in Linux, and almost all calls are based on the C library, therefore, glibc is particularly important. The implementation of glibc is the same as that of windows. It maintains a global linked list, then, each linked list element is composed of fixed memory blocks or unfixed memory blocks. Unlike windows, glibc maintains more than one variable-length memory block linked list, but several, each of these linked lists is responsible for a size range, which effectively reduces the traversal overhead when allocating large memory, similar to the hash method, hash a large range of data to a limited number of small ranges, rather than all the data, although it is still necessary to search in a small range, however, at least a lot of overhead is saved. If there is only one variable long-chain table, all traversal is required. If there are three tables, 2/3 overhead is saved. In short, this policy is very similar to hash. Another policy of glibc is to maintain not only a type of idle linked list, but also a buffer linked list and a high-speed buffered linked list. During allocation, we first search in the high-speed cache, if the memory block found is relatively large, insert the remaining memory block after the cut into the cache linked list, if the idle linked list fails to be searched, it will be searched in the cache linked list. What is the basis for this search? Actually, this method gives glibc its own strategy.

This basis is embodied in free. If it can be merged at the top of the heap, that is, it can be merged with the idle elements at the top of the heap, then it will be merged, because the reduction of the heap only takes place when the idle elements at the top of the heap reach a certain amount, in order to return the memory to the operating system as soon as possible, the release of the heap should be given priority, however, if it cannot be merged, for example, it is not adjacent to the heap top. If the size of the released block is smaller than 80 bytes, it will be directly attached to the cache, to prevent other blocks from merging with it, we can see that glibc actually maintains a high-speed memory pool for small memory blocks smaller than 80 bytes, if there is a small memory demand, just remove one from the pool, you only need to remove from the cache and do not need to modify the usage bit, because the usage bit of the elements in the cache is 1, this high-speed cache will be recycled when there is a need to allocate large memory blocks and several allocation policies fail. The process of recycling them into the idle linked list involves merging adjacent blocks, after the merge, it may meet a slightly larger memory allocation requirement. Why is the limit set to 80 bytes? It is actually an experience value. When the memory block between 80 bytes and K bytes is released, it must set the bit to 0, then try to merge with adjacent blocks, and then mount it to the cache linked list.

The following section is taken from an article that describes in detail the memory allocation and release process of glibc, provided that the allocation size is smaller than 128 KB:

1. the heap is increased or compressed by means of BRK. If a suitable chunk cannot be found in the existing heap, the heap will be allocated by means of increasing heap, if the idle block on the top of the heap exceeds a certain threshold, the heap will be reduced as long as the space on the top of the heap is not released.

2. The heap allocation information is recorded in two ways. First. the chunk header is used to record the size of the previous chunk. The second word is used to record the size of the current Chunk and some flag spaces, the memory to be used starts from the third word. Therefore, the chunk can be found through the memory address, and the memory address can also be found through the chunk. You can also find the next adjacent Chunk and the previous adjacent chunk. A heap is composed of N chunks. Second, there are three types of queue records. Only idle chunks will appear in the queue, and the chunks used will not appear in the queue. If the memory block is idle, it will be mounted to one of the queues. It is reused by memory, use the 3rd and 4th characters of the idle chunk as the chain before and after it (the variable length block is 5th words and 6th words), and save the allocated space for it. The first queue is bins, And the bins have 128 queues. The first 64 queues are fixed-length, and each 8-byte block is allocated in one queue, the last 64 queues are not long, that is, all the queues with the same range length are allocated in one queue. All messages with a length less than 512 bytes (approximately) are allocated to a queue with a fixed length. The following 64 queues are variable-length queues, And the chunks in each queue are arranged from small to large. The second type of queue is the unsort Queue (only one queue), which is a buffer. All the free queues must pass through the unsort queue if they are to enter the bins queue. The third type of queue is fastbins. There are about 10 fixed-length queues. (a high-speed buffer) all the free chunks with a length less than 80 will enter this queue. The chunk entering this queue does not modify the bit usage when it is free, so as to avoid being merged by adjacent blocks.

3. malloc steps

--> Search for fastbins first. If it can be found, remove it from the queue (no need to set the bit to 1 again) and return immediately.

--> Determine whether the required block is within the range of the small box (the first 64 bin of bins). If the required block is within the range of the small box, the memory address is directly returned; if the range is in the big box (the last 64 bin of bins), The consolidate is triggered. (Because it is generally necessary to cut large boxes, merging should be given priority to avoid excessive fragmentation)

--> Then extract a chunk from unsort. If a chunk of the same size as the desired chunk can be found, return immediately. If it is not the chunk of the chunk size, insert it into the queue corresponding to bins. Turn 3 until it is cleared or 10000 times in a loop.

--> Then, find a minimum chunk in bins and remove it from the queue. If the remaining chunk can be created, the chunk is divided into two parts, insert the remaining chunks into the unsort queue and return the chunk memory address.

--> Find the topchunk (a chunk on the top of the heap and it will not be placed in any queue). If you can cut out the desired chunk, use the remaining part as the topchunk, then return the memory address.

--> If the value of fastbins is not empty, trigger publish lidate to clear all fanbins (The position of fanbins is 0, the adjacent blocks are merged, and then mounted to the unsort Queue ), then proceed to step 1.

--> If sysalloc is not found, it is actually a growth heap. Then return the memory address.

4. Free steps

--> If it is adjacent to the topchunk, it is directly merged with the topchunk and will not be placed in other idle queues.

--> If the released size is smaller than 80 bytes, it will be mounted to fastbins, And the bit is still 1. Of course, it will not merge adjacent blocks.

--> If the size of the released Chunk is between 80-128 K, set the chunk usage position to 0, and then try to merge adjacent blocks and mount them to the unsort queue, if the size after merging is greater than 64 K, the merge lidate will also be triggered (it may be that there are more small pieces around it) before trying to contract the heap. (The condition for shrinking the heap is that the current free block size plus the size of the chunk that can be merged before and after it is greater than 64 K, and the heap top size must reach the threshold value before it can be reduced)

The above explanation is quite clear. The key to glibc allocation is to adopt some strategies, such as multiple hashed lists, such as high-speed cache policies and general cache policies, the reason is that the usage of small memory is usually higher than that of large memory. Therefore, it is necessary to maintain a high-speed pool for small memory, and the release frequency of small memory is also high, it is generally used to store some temporary data. Therefore, maintaining a pool for a small memory is not unfair to other requirements. The advantage of caching is that its capacity is generally small and Its traversal search is fast, and the data in it is almost always hot. Here, the so-called data overheating does not mean the memory block size, but the memory block size, one memory block is very popular, which means that memory blocks of this size are frequently used. The other is that only frequently used data will be cached, and the amount of data will not be too large, if the cache is too large, it will lose its search advantage. If the data is not hot, the search will frequently fail and eventually enter the normal allocation mode, therefore, designing a cache system requires a lot of consideration, not just as simple as theory. Another policy is the special processing of the heap top. The heap top is not placed in any linked list. It is taken care of because the memory is returned to the operating system more effectively, because the heap compression can only start from the top of the heap, the operating system only knows that the heap memory is a large continuous block of virtual memory for a process, and nothing else knows, when applications are returned, they also need to be returned from the top of the heap instead of just returning holes. In the final analysis, they need to perform special processing on the heap.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.