Http://www.cnblogs.com/bangerlee/archive/2011/08/31/2161421.html
Introduction
C/C ++ memory management is a headache for almost every programmer. allocate enough memory, track the memory allocation, and release the memory when it is not needed-this task is quite complicated. The direct use of the system to call malloc/free and new/delete for memory allocation and release has the following drawbacks:
- When calling malloc/new, the system needs to find an idle memory in the memory idle block table based on "first match", "Optimal Match", or other algorithms, and call free/delete, the system may need to merge idle memory blocks, which will produce additional overhead.
- When used frequently, a large amount of memory fragments are generated, thus reducing the program running efficiency.
- Easy to cause memory leakage
Memory Pool (memory pool) is a common method to directly call malloc/free and new/delete for memory management. When we apply for memory space, first, find the appropriate memory block in our memory pool, instead of directly applying to the operating system. The advantage is:
- Faster than malloc/free memory application/release
- Does not generate or rarely generate heap fragments
- Avoid Memory leakage
Memory Pool Design
Seeing that there are so many benefits to the memory pool, is it possible to abandon malloc/free immediately and rush to the memory pool? Slow. Before we implement the memory pool ourselves, we need to clarify the following issues:
- How to obtain the memory pool space? Is a large block of space allocated when the program is started, or is it allocated as needed when the program is running?
- Is there a size limit for the memory pool applied? If yes, what is the maximum memory block that can be applied?
- How can we reasonably design the memory block structure to facilitate memory application, tracing, and release?
- The more space the memory pool occupies, the less memory the other programs can use. Do you want to set the upper limit of the memory pool space? How many values are appropriate?
With the above problems, let's look at the following memory pool design scheme.
Memory Pool implementation solution 1
Download the source code of the memory pool.
First, the overall architecture of the solution is provided as follows:
Figure 1. Memory Pool Architecture
The structure mainly contains three structures: Block, list, and pool. The block structure contains pointers pointing to the actual memory space. forward and backward pointers allow the block to form a two-way linked list; in the list structure, the free Pointer Points to a linked list composed of idle memory blocks, and the used Pointer Points to a linked list composed of memory blocks used by the program. The size value is the size of the memory block, and the list is a one-way linked list; the pool structure records the head and tail of the list linked list.
Memory Tracking Policy
In this solution, 12 more bytes will be applied for during memory allocation, that is, the actual applied memory size is the size of the required memory + 12. The corresponding list pointer (4 bytes), used pointer (4 bytes), and verification code (4 bytes) are respectively stored in the 12 bytes that are added ). With this setting, we can easily get the list and block where the block memory is located, and the check code serves as a rough check for errors. The following figure shows the structure:
Figure 2. memory block Application
The arrows in the figure indicate the starting position of the memory block.
Memory application and release policies
Application: based on the size of the applied memory, traverse the list to check whether a matched size exists;
Matched size: NULL when viewing free
Free is null: Use malloc/New to apply for memory and place it at the end of the chain table indicated by used.
Free is not null: removes the header node of the Free linked list and places it at the end of the linked list indicated by used.
No matching size exists: Create a new list, apply for memory using malloc/new, and place it at the end of the chain table indicated by used of the List
Returns the memory space pointer.
Release: Get the list pointer and used pointer Based on the memory tracking policy, delete it from the linked list referred to by the used pointer, and place it in the linked list pointed to by the Free pointer.
Analysis of solution 1
Compared with the problems raised in the "Memory Pool Design" section, solution 1 has the following features:
- After the program starts, there is no memory block in the memory pool. It takes over the memory block management only when the program is applied for and released;
- This memory pool does not limit the size of the application to be submitted. It creates a linked list for each size value for memory management;
- This solution does not provide the function of limiting the memory pool size.
Combined with analysis, we can conclude that the application scenario of this solution is as follows: the memory block size applied by the program is relatively fixed (for example, only apply for/release 1024bytes or 2048bytes memory ), the application frequency and release frequency are basically the same (due to a large number of applications, releasing less will occupy too much memory and eventually lead to system crash ).
This article explains the basic knowledge of memory management and uses a simple memory pool implementation example as a stepping stone to help you understand the memory pool. The next article is an advanced article on memory pool, describes how to implement the memory pool on the Apache server.
Http://hi.baidu.com/haven2002/item/bb523eca223b3c09ac092f52
I. Linux memory management and causes of memory fragmentation
The bottom layer uses the partner algorithm to manage the Memory Page. The system divides all idle memory pages into 10 groups. The size of memory blocks in each group is 1, 2, and 4 ...... 512 memory pages, each group has the same memory block size and is saved as a linked list. Two memory blocks with the same size and continuous memory addresses are called partners. The core idea of partner algorithms is to combine the idle memory of a partner into a larger memory block.
In OS, get_free_page is used to obtain the idle page. If no idle page of the proper size is found, the idle memory block is found in a larger group, allocated, and the remaining memory is split, insert to the appropriate group. When returning the memory, start the partner algorithm to merge the idle memory. If you keep applying for memory and return part of the memory, but the returned memory cannot become a partner, after long-term operation, all the memory will be split into non-adjacent small blocks. When you apply for a large block of memory again, it may fail because a large enough continuous memory block is not found. This scattered non-adjacent small memory block is called memory fragmentation. Of course, this is just a theoretical explanation. The partner algorithm is designed to solve the memory fragmentation problem.
2. Memory Management of the malloc subsystem (dlmalloc)
Development at the application layer does not directly call functions such as sbrk/MMAP, but calls functions provided by the malloc/free and other malloc subsystems. Most of the functions installed on Linux are the dlmalloc of douglea or its deformation ptmalloc. The following uses dlmalloc as an example to describe how malloc works.
1. Glossary of dlmalloc:
Boundary Tag:Boundary Mark. Each idle memory block has a header table recognition and a tail ID. The tail table recognition is faster when the idle memory block is merged. This space is a waste of memory space that cannot be used by the application layer.
Smallbins:Small memory box. Dlmalloc splits the memory bins of 8, 16, 24... 512 in size into 8 bytes. The memory size in each box is the same and connected using a two-way link table.
Treebins:Tree Structure box. Memory larger than 512 bytes is no longer a box of 8 bytes, but a box of range segments. For example, 512 ~ 640,640 ~ The range of each box is 896 ....... The structure in each box is not a two-way linked list, but a tree structure.
DV Chunk:When the appropriate memory size cannot be found in the corresponding size box while applying for memory, find a piece of memory from the larger box and divide the required memory. The remaining memory is called DV chunk.
Top Chunk:When no suitable memory is found in the memory managed by dlmalloc, The sbrk is called to request memory from the system. The chunk that can increase the memory direction is called top chunk.
2. Memory Allocation Algorithm
Find the memory block from the right box --> find the memory block from the corresponding box --> allocate memory from DV chunk --> allocate memory from other feasible boxes --> from top allocate memory in chunk --> call sbrk/MMAP to apply for memory
3. Memory release Algorithm
Near memory merge --> if it belongs to top Chunk and top chunk> 128 K, return it to the System
--> If the chunk does not belong to the corresponding box
Dlmalloc also has other mechanisms such as small memory cache. It can be seen that after dlmalloc, frequent calls to malloc/free will not generate memory fragments. As long as the same memory size is applied in the future, the former suitable memory will still be used, unless a large number of calls to malloc release a small amount of free, and the new malloc is larger than the previously free memory size, resulting in dlmalloc continuously applying for memory from the system, the free memory is cut off because of the memory used, so that the top chunk <128 K, cannot be returned to the system. Even so, the total memory usage is less than twice the actually used memory usage (the used memory is separated from the idle memory, and the idle memory is always smaller than the memory used ). Therefore, in the absence of Memory leakage, regular frequent calls to malloc/free will not produce memory fragments.
Three Application Layer Memory Pool
Even if there is no memory fragmentation problem, the application layer still needs a memory pool for the following reasons:
1. Fixed and controllable memory Stability
2. Reduce the possible performance angle of interaction with the kernel state
3. Reduce the performance of mutex operations. Each thread calls malloc directly. It is very likely that a thread enters the race state and falls into the kernel state.
Among them, stability can only be a statement of masturbation. The OS itself is not credible, and it is still stable. The most important starting point is to control the memory at the application layer to improve the performance at the application layer. So how can we create a memory pool to fully improve the memory usage performance? Let's start with the famous memory pool.
Four common memory pools
Variable-length memory pool:
1 Apr pool: for business processing, the entire business scenario is segmented. Different types of memory pools are used in different stages. After the memory is returned to the pool, it cannot be used again, but the pool itself can be reused, memory waste.
2 obstack: GCC built-in variable-length memory pool
Fixed-length memory pool:
1 sgi stl: pool small memory. The byte length is 8, 16 ...... 128 A total of 16 pools, with the same memory size in each pool, are connected using a linked list. The small memory adopts the policy of never returning the malloc sub-system, and CALLS malloc directly if the size is greater than 128. Sgi stl is the STL implementation carried by GCC. Although the STL carried by VC and BC also has Allocator objects, there is no real pool, but malloc is called directly.
2 boost/Loki
The two memory pools adopt a similar underlying algorithm. Taking Loki as an example, when a fixed-length memory is applied for the first time, Loki will apply for 255 memory records at a time, and then apply for a direct retrieval from the pool again, an example of an algorithm for applying to release memory from a pool is as follows:
(1) After applying for memory for the first time, the number of idle memory and the number of the next memory block are saved in the previous memory. The nextblock variable is used to save the memory block that can be applied for the next time. nextblock = 0 for the first time.
(2) After applying for three pieces of memory, nextblock = 3
(3) when the second block of memory is returned, find the Chunk Based on the memory base address, compare the chunk base address and the length of the memory block in the hold, find the block number, and save the nextblock at the end, nextblock = 1
(4) return the third block again. Save the last nextblock at the end of the third block. nextblock = 2
(5) apply for memory again. Specify the allocated memory according to nextblock. nextblock is equal to the value 1 pointed to at the end of the block memory.
Loki and boost are slightly different in terms of memory processing, including the Memory organization level. I personally think these differences are the disadvantages of Loki over boost.
Loki/Boost represents the highest level of the current memory pool. The pool does not have any redundant headers (free memory stores redundant information), which saves more memory. In addition, the allocation and release of memory are fast, there are only a few fixed constant steps for calculation.
The above algorithms only lay a solid foundation for the subsequent use of the memory pool, and do not provide the usage of the memory pool.
Category of memory pool usage
Loki provides the memory pool usage policies, which are divided into the following three types:
1. All memory requests of the same length in the global memory pool use the same memory pool and different memory requests of different lengths use different memory pools. When you apply to release the memory in the pool, lock the pool.
2. The object memory pool has one memory pool for each object. Apply to release the memory and perform the lock operation.
3. When the thread memory pool has the same length of memory and the memory in the same memory is applied for release, the thread memory pool is applied for release without locking.
The third part is the reason why the application layer uses the memory pool. Obviously, the global memory pool does not solve the performance problem. the concurrent application of memory by each thread still has a mutex problem similar to calling malloc directly.
The Object Memory Pool further reduces the mutex. The mutex problem occurs only when the same object is requested to be released across threads.
The thread memory pool is undoubtedly the most efficient with no lock overhead.
It can be seen that the optimal memory pool usage is to use the object memory pool for objects that have cross-thread operations, and use the thread memory pool for objects that only operate within the same thread. Objects can be implemented through the operator new and operator delete operations of the overloaded objects.
The boost library is extremely suitable for further encapsulation for the object memory pool and thread Memory Pool (combined with thread-specific storage.
Six Linux memory pool Terminator
Tcmalloc can be used to intelligently determine whether the object memory pool or thread memory pool should be used through the cache and other mechanisms. The encoding does not require any additional policies and uses new/delete directly, you only need to connect to libraries such as libtcmalloc. Unfortunately, only Linux is supported.
There is clear support for test data. After linking with tcmalloc, the original CPU remains high and the server program becomes abrupt, greatly reducing the occurrence of mutex race conditions that directly call malloc, And the CPU becomes stable. Typically, MySQL is recompiled after tcmalloc is linked in Linux.