Introduction:
The purpose of the memory pool design is to ensure that the server runs efficiently for a long period of time. by effectively managing objects that are frequently applied for because of a small application space, the memory fragmentation is reduced, rationally allocate and manage user memory, so as to reduce the amount of effective space in the system, but cannot allocate large blocks of continuous memory.
Objectives:
The basic goal of the memory pool is designed to meet thread security (multithreading). A proper amount of out-of-bounds checks for Memory leakage are provided, and the running efficiency is not lower than that of the malloc/free mode, memory Pool Management applied for memory space in the range of 4-bytes (memory pool managed by non-single fixed size object ).
Design and Implementation of memory pool technology
The design method of this memory pool mainly refers to the SGI alloc design scheme. In order to be suitable for general applications, make some simple modifications based on alloc.
The Memory Pool Design Scheme of mempool is as follows (you can also refer to the in-depth analysis of STL by Hou Jie)
Apply for a large heap memory from the system, divide blocks of different sizes on the memory, and connect blocks of the same size to form a linked list. For example, a chunks of the size of a constitute the linked list L. When applying for a large hour, the chunk is directly obtained from the linked list l header (if not empty) and handed over to the applicant, when a block is released, it is directly attached to the L header. The principle of the memory pool is relatively simple, but you need to pay attention to a lot of details in the specific implementation process.
1: byte alignment.
To facilitate the management of objects in the memory pool, you need to adjust the size of the applied memory space. In mempool, the size of the byte alignment is the number of bytes close to 8 multiples. For example, if you apply for 5 bytes, mempool first adjusts it to 8 bytes. For example, if you apply for 22 bytes, the value is changed to 24. The comparison is as follows:
Serial number |
Alignment bytes |
Range |
0 |
8 |
1-8 |
1 |
16 |
9-16 |
2 |
24 |
17-24 |
3 |
32 |
25-32 |
4 |
40 |
33-40 |
5 |
48 |
41-48 |
6 |
56 |
49-56 |
7 |
64 |
57-64 |
8 |
72 |
65-72 |
9 |
80 |
73-80 |
10 |
88 |
81-88 |
11 |
96 |
89-96 |
12 |
104 |
September 104 |
13 |
112 |
105-112 |
14 |
120 |
113-120 |
15 |
128 |
121-128 |
(Figure 1)
For applications with more than 128 bytes, call the malloc function to apply for memory space. The memory pool designed here does not manage the memory of all objects, but only applies for a small amount of memory space, while frequently applied objects are managed. For applications for objects larger than 128 bytes, not considered. This need to be combined with the actual project is not fixed. The alignment functions are as follows:
Static size_t round_up (size_t size)
{
Return (size) + 7 )&~ 7); // aligned by 8 bytes
}
2: Build an index table
The objects managed in the memory pool are all fixed sizes. Now we need to manage the object application space in the range of 0-bytes. In addition to the byte alignment mentioned above, we also need to modify it, this is to create an index table. The procedure is as follows;
Static _ OBJ * free_list [16];
Create an array containing 16 _ OBJ * pointers. For more information about the _ OBJ structure, see. Free_list [0] records the first address of an 8-byte linked list. free_list [1] records a 16-byte linked list, And free_list [2] records a 24-byte list. For the ing between subscripts and byte linked lists in free_list, see the relationship between "sequence number" and "alignment Byte" in Figure 1. This relationship can be easily calculated using algorithms. As follows:
Static size_t freelist_index (size_t size)
{
Return (size) + 7)/7-1); // alignment by 8 bytes
}
Therefore, when the user applies for space a, we can jump to the idle linked list containing a byte size through the simple conversion above, as shown below;
_ OBJ ** P = free_list [freelist_index (a)];
3: Create an idle linked list
Through the index table, we know that there are 16 idle linked lists in mempool. The sizes of idle objects managed in these idle linked lists are 8, 16, 24, 32, 40... 128. These idle linked lists are linked in the same way. Generally, we need to create the following struct when creating a single-chain table.
Struct OBJ
{
OBJ * next;
Char * P;
Int isize;
}
The next Pointer Points to the next structure. P points to the real available space. isize is used only for the size of available space. In some other memory pool implementations, there are more complex structures, for example, it also includes the pointer to the upper-level struct that records the struct and the variables in the space currently used in the struct. When a user applies for a space, the user applying for the struct is removed, for example, you can apply for a 12-byte space.
OBJ * P = (OBJ *) malloc (12 + sizeof (OBJ ));
P-> next = NULL;
P-> P = (char *) P + sizeof (OBJ );
P-> isize = 12;
However, we didn't adopt this method. one disadvantage of this method is that when users apply for a small space, there is too much memory pool feed. For example, when a user applies for 12 bytes, the actual situation is that the memory pool has applied for 12 + sizeof (OBJ) = 12 + 12 = 24 bytes of memory, this wastes a lot of memory for marking the memory space, and does not reflect the advantages of the index table. Mempool adopts the Union method.
Union OBJ
{
OBJ * next;
Char client_data [1];
}
In addition to modifying struct to union, and removing int isize, and changing char * P to Char client_data [1], there is not much modification. The advantage is also reflected here. If the struct method is used, we need to maintain two linked lists. One is the allocated memory space linked list, and the other is the unallocated (idle) Space linked list. Instead, we use the index table and the Union struct. We only need to maintain a linked list, that is, the unallocated space linked list. The details are as follows:
Index tables have two functions: 1. As mentioned above, maintain 16 idle linked lists. 2. Record the size of space on each linked list in disguise, for example, a table with a subscript of 3 maintains a 24-byte idle linked list. In this way, we use the index table to reduce the isize variable in which P points to the space size in the structure. This reduces the size by 4 bytes.
The feature of Union is that the variables in the structure are mutually exclusive. In the re-running status, there is only one type of variable. So here the size of sizeof (OBJ) is 4. Do we also need to add these 4 bytes to the user's application space? Actually not. If so, we will erase the Union feature.
When we construct an idle distribution linked list, we use next to point to the next union struct so that we do not use the P pointer. When this struct is allocated, the client_data address is directly returned. At this time, client_data points to the first byte of the requested space. Therefore, we do not need to add anything to the user's application space.
Figure 2
The connection method of obj is shown in the preceding figure. Therefore, you do not need to add any content to the user's requested space.
4: record the number of bytes in the applied space
If we use the object-oriented method, or we can clearly know the size of the space to be released when releasing the memory pool space, we do not need to use this method.
Figure 3
Free in C language does not pass the size of the space to be released, but can be correctly released. Here it also imitates this method, using this method to record the size of the requested space to release the memory. The user's requested space + 1 operation will be executed before the byte alignment. After finding the appropriate space, rewrite the first byte to the size of the requested space. Of course, one byte can record a maximum of 256 data records, if required, you can set it to the short or Int type, but this requires a large amount of space. When the memory space is released, read this byte first, get the space size, and release it. To facilitate the proper release of objects larger than 128 bytes and apply for memory larger than 128 bytes, add the size of 1 byte record. Therefore, the user memory application space cannot exceed 255 bytes, but the project requirements have been met. Of course, you can also change it to the short type to record the size of the requested space.
// Apply
* (Unsigned char *) result) = (size_t) N;
Unsigned char * ptemp = (unsigned char *) result;
++ Ptemp;
Result = (_ OBJ *) ptemp;
Return result;
// Release
Unsigned char * ptemp = (unsigned char *) PTR;
-- Ptemp;
PTR = (void *) ptemp;
N = (size_t) (* (unsigned char *) PTR );
5: Memory Pool allocation principle
In the design of the memory pool, there are two important operations: chunk_alloc, applying for large memory, and 2: refill backfilling, when the memory pool is initialized, it does not create an idle allocation linked list for each item in the index table. This process is postponed until the user extracts the request. For details, refer to the following code (you can also see the two functions in the stl_alloc.h file in SGI). The main steps are described in the annotations.
/**
* @ Bri: Apply for a large block of memory and return the size * (* nobjs) of the memory block.
* @ Param: Size, size after round_up alignment, nobjs
* @ Return: returns the memory pointer to the first object.
*/
Static char * chunk_alloc (size_t size, int * nobjs)
{
/** <Return pointer */
Char * _ result;
/** <Memory block size applied */
Size_t _ total_bytes = size * (* nobjs );
/** <Current memory available space */
Size_t _ bytes_left = _ end_free-_ start_free;
/** <Large available memory in the memory pool */
If (_ bytes_left> = _ total_bytes)
{
_ Result = _ start_free;
_ Start_free + = _ total_bytes;
Return (_ result );
}
/** <There is at least one object-sized memory space */
Else if (_ bytes_left> = size)
{
* Nobjs = (INT) (_ bytes_left/size );
_ Total_bytes = size * (* nobjs );
_ Result = _ start_free;
_ Start_free + = _ total_bytes;
Return (_ result );
}
/** <No space in the memory pool */
Else
{
/** <Re-apply for memory pool size */
Size_t _ bytes_to_get = 2 * _ total_bytes + round_up (_ heap_size> 4 );
/** <Add the remaining memory space to freelist */
If (_ bytes_left> 0)
{
_ OBJ * volatile * _ my_free_list =
_ Free_list + freelist_index (_ bytes_left );
(_ OBJ *) _ start_free)-> free_list_link =
* _ My_free_list;
* _ My_free_list = (_ OBJ *) _ start_free;
}
// Apply for a new large space
_ Start_free = (char *) malloc (_ bytes_to_get );
/* ===================================================== ======================================= */
Memset (_ start_free, 0 ,__ bytes_to_get );
/* ===================================================== ======================================= */
// The system memory is no longer available, so the memory is compressed from the memory pool
If (0 = _ start_free)
{
Size_t _ I;
_ OBJ * volatile * _ my_free_list;
_ OBJ * _ P;
/** <Check available space one by one from freelist (only memory space larger than the size object is collected )*/
For (_ I = size; _ I <= (size_t) _ max_bytes; _ I + = _ align)
{
_ My_free_list = _ free_list + freelist_index (_ I );
_ P = * _ my_free_list;
/** <Find idle block */
If (_ p! = 0)
{
* _ My_free_list = _ p-> free_list_link;
_ Start_free = (char *) _ P;
_ End_free = _ start_free + _ I;
Return (chunk_alloc (size, nobjs ));
}
}
_ End_free = 0;
/** <Re-apply for memory, which may trigger an exception */
_ Start_free = (char *) malloc (_ bytes_to_get );
}
/** <Record the current Memory Pool capacity */
_ Heap_size + = _ bytes_to_get;
_ End_free = _ start_free + _ bytes_to_get;
Return (chunk_alloc (size, nobjs ));
}
}
/* ===================================================== ======================================= */
/**
* @ Bri: Fill in freelist connections. By default, 20 connections are filled.
* @ Param: _ n, fill in the object size, the value after 8-byte alignment
* @ Return: idle
*/
Static void * refill (size_t N)
{
Int _ nobjs = 20;
Char * _ chunk = (char *) chunk_alloc (N ,__ nobjs );
_ OBJ * volatile * _ my_free_list;
_ OBJ * volatile * _ my_free_list1;
_ OBJ * _ result;
_ OBJ * _ current_obj;
_ OBJ * _ next_obj;
Int _ I;
// If the memory pool has only one object
If (1 = _ nobjs)
Return (_ chunk );
_ My_free_list = _ free_list + freelist_index (N );
/* Build free list in chunk */
_ Result = (_ OBJ *) _ chunk;
* _ My_free_list = _ next_obj = (_ OBJ *) (_ chunk + n );
_ My_free_list1 = _ free_list + freelist_index (N );
For (_ I = 1; ++ _ I)
{
_ Current_obj = _ next_obj;
_ Next_obj = (_ OBJ *) (char *) _ next_obj + n );
If (_ nobjs-1 = _ I)
{
_ Current_obj-> free_list_link = 0;
Break;
} Else {
_ Current_obj-> free_list_link = _ next_obj;
}
}
Return (_ result );
}
After the above operations, the memory pool may become the following state. We can see from the figure that a free distribution linked list of 8, 24, and 88,128 bytes has been built, and all their pointers without any idle allocation linked list point to null. We determine whether the pointer in the index table is null and whether the idle allocation table has been created or used up. if the pointer is null, we call the refill function, re-apply for 20 such Memory Spaces and connect them. In the refill function, we need to check whether there is available memory in the large memory. If there is, and the size is appropriate, it is returned to the refill function.
Figure 4
6. Thread Security
Use mutex to ensure thread security.
Memory Pool Test
The memory pool test is mainly divided into two parts: 1. Comparison of the allocation speed of malloc and mempool in a single thread; 2. Comparison of the allocation speed of malloc and mempool in multiple threads. We can divide it into 4, 10, 16 threads have been tested.
Test environment: Operating System: Windows2003 + SP1, vc7.1 + SP1, hardware environment: Intel (r) celon (r) CPU 2.53 GHz, M physical memory.
The application memory space settings are as follows:
# Define allocnumber0 4
# Define allocnumber1 7
# Define allocnumber2 23
# Define allocnumber3 56
# Define allocnumber4 10
# Define allocnumber5 60
# Define allocnumber6 5
# Define allocnumber7 80
# Define allocnumber8 9
# Define allocnumber9 100
Both the malloc and mempool modes use the above data for memory space application and release. Apply for releasing the above data 20 times each time.
We tested the following application times for malloc and mempool respectively (unit)
2 |
10 |
20 |
30 |
40 |
50 |
80 |
100 |
150 |
200 |
Various test data of malloc and mempool in single thread, multi-thread, release, and debug form the following statistical chart
Figure 5
We can see that mempool is faster than direct allocation in malloc mode regardless of multithreading or single thread.
In debug mode in malloc mode, the running time is as follows in different threads. We can see from the figure that in malloc mode, the speed of applying for space is not closely related to multithreading in debug mode. Multi-threaded mode, which is slightly faster than single-threaded running.
Figure 6
The following figure shows the test results of the malloc mode release mode.
Figure 7
The advantages of multithreading gradually come into being. When applications and releases are executed, multithreading is about Ms faster than a single thread, and the difference between 4, 10 and 16 threads is not very big. However, it seems that the running time of the four threads is slightly higher than that of the 10 or 16 threads, which means that the more time the midline thread of the process is used for thread switching.
The following is the debug test result of mempool.
Figure 8
The test results of mempool in release mode are as follows:
Figure 9
The data used in all the preceding statistical charts is the average value after three tests.
Through the above test, we can know that the performance of mempool basically exceeds the direct malloc mode. In the case of requests and releases, the single-thread release version, mempool is 110 times faster than direct malloc. In the case of four threads, mempool is about seven times faster than direct malloc. The above test is only a test of the application speed. The test results may be different under different pressures. The test results do not indicate that the mempool mode is more stable than the malloc mode.
Summary: The Memory Pool basically meets the initial design goal, but it is not perfect and has defects. For example, you cannot apply for a memory space larger than 256 bytes without out-of-bounds memory check, no memory auto-shrinking function. But these influences on us are not that important.
Because this is a company project and the Code involves copyright, it cannot be published. If you want to create your own memory pool, contact me ugg_xchj # hotmail.com.