Fast and Effective Memory distributor for small memory blocks
Translated by znrobinson Guo shilong
Introduction
Dynamic memory allocation is interesting. When calling malloc/free, most people will not consider the associated costs. Heap-based memory allocation is mentioned. In order to re-apply for and re-use the memory, the management of memory blocks must carry out a lot of notes, which will take a CPU cycle. When we try to write efficient code, the first rule is to avoid hitting the distributor whenever possible. But one reason for the existence of malloc/free is that it is a tool like other functions, but it needs to be used properly. But can you use it at a low cost?
Many people, like myself, take over this small challenge to show what a mature man is doing with a large amount of hormones. The goal is to write a fast, most efficient small memory block distributor and have the opportunity to get the right to speak in front of others. Well... this is not a male attitude. The root cause is that the UI toolbox we work on allocates an astonishing amount of memory at an astonishing speed, including a large number of small memory requests less than bytes. We have a very gentleman's opinion on who is better. Well... I mean who can write a good allocator. In terms of competition, the basic rules are as follows:
There are two declared functions: void * alloc (Long SIZE); void free (void * P );
Must support more than 1024 allocation (the speed and efficiency test is based on <= 1024 allocation );
It must be a "natural" alignment to the power of the next 2, with a maximum of 8 bytes alignment;
Null release cannot crash;
0 allocation must be supported (a valid pointer is returned );
Blockalloc ()/blockfree () (essentially malloc and free) must be called to build pool allocation.
The main score points are speed and efficiency. You can evaluate the efficiency by measuring the amount of memory waste during the evaluation process.
Efficiency = The amount of memory requested by your distributor/The amount of memory requested by you from blockalloc
Given the efficiency, the score is calculated as follows:
Score = Time (unit: milliseconds)/(efficiency × efficiency)
The lowest score wins. It can be seen from this that, if not as efficient as possible, there will be a lot of punishment. In our performance and efficiency tests, my Allocator beat visual c's malloc/free, improving speed by 25 times and improving efficiency by 13%.
Although this implementation is quite fast, we do not advocate the establishment of this small shard distributor as the fastest possible. I have a lot of ideas about how to make it faster and I'm sure there are other implementations that can beat it. But what's interesting is how easy it is to beat Microsoft's out-of-box implementation. And you are interested in further developing it. This is a good start point.
A good creed is to make things as simple as possible and introduce complexity only when necessary. The first two of my distributors are complex monsters, and the biggest reason is that I have focused on errors (for example, minimizing the size of blocks ). Finally, my alignment becomes quite simple. Essentially, it is a fixed block alignment which manages 129 separate heaps, each heap manages a specific fixed allocation size starting from 4 bytes, then 8 bytes, and then increases to 1024 bytes with 8 bytes. Technically, this is a sub-distributor, which uses malloc/free to allocate and release larger memory blocks. This memory block manages these memory blocks and then uses them to allocate smaller memory blocks. In our test, the Allocator won by managing this smaller allocation more effectively than the general objective malloc/free.
A fixed block distributor, as it sounds, is a distributor that can only allocate a fixed or given size block. Because you only need to process blocks of a given size, the complexity of the Code volume and the data structure that requires memory management is minimized, which is directly mapped to the effect.
Allocate memory
Let's take a look at the rtallocator: alloc method:
Void * rtallocator: alloc (long ls );
The first thing here is that you need to find out which of the 129 independent heaps is suitable for meeting the request. First, check the allocated size (LS) to see if it is greater than 1024. If the request is greater than 1024, the request simply throws a common target Allocator (malloc ). Because I don't care about memory allocation in this size. If the size is <= 1024, You need to determine which of the 129 fixed-size heap is used to satisfy the request. To do this quickly and efficiently, you need to use a 1024-element search table. This lookup table is initialized with a number, which indicates which of the 129 fixed-size heaps is used to satisfy the request. Take a look at this Code:
Void * rtallocator: alloc (long ls)
{
If (LS = 0) ls = 1; int bdindex =-1;
If (LS <= 1024) bdindex = mbdindexlookup [ls];
The first line deals with the exception of trying to allocate 0 byte; in this case, you handle it to allocate 1 byte by changing the size to 1. In the next row, the value of the initialized index bdindex is-1, and if it is allocated within your target range (<= 1024 ), the table size is used as the table offset to determine which of the 129 heaps is used.
If the allocation request is greater than 1024, the index and bdindex are set to-1, and the request is simply passed to the universal target distributor. The Code is as follows:
If (bdindex <0)
{
// Not handling blocks of this size throw to blockalloc
Incalloccounter (bdcount );
Return allocjr_alloc (LS );
}
Note:Macro allocjr_alloc is used to wrap malloc so that you can record the allocation of statistical values. Allocjr_free is used to call the free function for the same purpose.
In the code, you know the allocation size (<= 1024) You are processing and you know which of the 129 fixed-size heap will be used to satisfy the request. Therefore, the next thing to do is to check whether you have a necessary memory block to meet the request. Each heap maintains a two-way linked list of idle blocks (blocks allocated at least once ). If there is no idle block, you need to allocate one (use malloc) and link it to your idle block linked list. Use the following code:
if (!mFreeBlocks[bdIndex]){ INCBLOCKCOUNTER(); block* b = (block*)ALLOCJR_ALLOC( block::getAllocSize(bd[bdIndex].fixedAllocSize, bd[bdIndex].chunks)); if (b) { b->init(bd[bdIndex].fixedAllocSize, bdIndex, bd[bdIndex].chunks); addBlockToArray(b); mFreeBlocks[bdIndex] = b; }}
At this point, there should be at least one quick offer that can be used to satisfy the allocation request. This allocation request is pushed to the block: alloc function to allocate memory in available idle blocks. Each block has many blocks, each of which is large enough to meet a allocation request. In the block data structure, a chunck single-chain table is maintained in the block.
To avoid the cost of creating a linked list when initializing a newly allocated block, you need to maintain a fence pointer and point it to the first uninitialized block. When allocating memory, first check whether the linked list contains any idle blocks. If not, check the chence pointer to see if it is in the chunk list. The chunk is included in the chunk list ). If there is space, add minitcursor to the next Chunk and use the chunk that was previously pointed by minitcursor.
inline void* alloc(){ void* result; if (mFreeChunk) { result = mFreeChunk; mFreeChunk = (void**)*mFreeChunk; } else { result = mInitCursor; mInitCursor += mFixedAllocSize; } mAllocCount++; return result;}
After returning from Block: alloc, you need to check whether the block is fully full. This is done by calling block: idfull. If it is full, remove the block from the two-way idle linked list. In this way, you do not need to consider this block when looking for free space to meet the request. When this block is removed from the idle linked list, a sentinel is assigned only to the mnextfreeblock pointer of the block, so that you can easily determine that the block is full. View the following code:
block *b = mFreeBlocks[bdIndex];if (b->mNextFreeBlock != ALLOCJR_FULLBLOCK && b->isFull()){ // Unlink from freelist if (b->mNextFreeBlock) { b->mNextFreeBlock->mPrevFreeBlock = b->mPrevFreeBlock; } if (b->mPrevFreeBlock) { b->mPrevFreeBlock->mNextFreeBlock = b->mNextFreeBlock; } mFreeBlocks[bdIndex] = b->mNextFreeBlock; // special value means removed from free list b->mNextFreeBlock = ALLOCJR_FULLBLOCK; b->mPrevFreeBlock = ALLOCJR_FULLBLOCK;}
At this point, you have successfully allocated a request Size Block. Now that you have completed the memory allocation process, I will show you the memory release process.Release memoryThe memory release process starts when you call the rtallocator: free void rtallocator: Free (void * P) function. The first thing to do is to check whether you have passed a null pointer. If this is the case, return: If (! P) return; if the pointer is not null, you need to check whether the memory is managed by you or has been returned from being passed to malloc. Therefore, you need to perform binary search on the block pointer array you maintain to see if it is your pointer. This is achieved by calling the following function: block * B = findblockinarray (p); if this block is yours, B will be non-empty; otherwise, you know that the pointer you are releasing is not yours and you can pass it directly to the free function. If it is yours, call the block: free function to release the memory block. In the block: free function, you just return the chunk to the idle linked list in the fast speed so that the chunk can be used again.
inline void free(void* p){ void **pp = (void**)p; *pp = mFreeChunk; mFreeChunk = (void**)p; mAllocCount--;}
In the first line of this function, you forcibly convert the pointer to void **; this will allow you to easily write the current linked list header pointer to the front of the chunk. This is actually done in the second row. In the third row, you can set the header pointer to point to the newly inserted chunk to insert the chunk into the idle linked list. The last thing you need to do in this function is to reduce the counter of the chunk allocated in the current record block. From Block: If the free function returns, you still have to perform at least two checks. First, check whether the block: free call is left empty. As follows:
If (B-> isempty (), if the current block is completely empty, you need to remove it from the two-way linked list of the idle block and return it to the system by calling the free function. As follows:
if (b->isEmpty()){ // Unlink from freelist and return to the system if (b->mNextFreeBlock) { b->mNextFreeBlock->mPrevFreeBlock = b->mPrevFreeBlock; } if (b->mPrevFreeBlock) { b->mPrevFreeBlock->mNextFreeBlock = b->mNextFreeBlock; } if (mFreeBlocks[b->mBDIndex] == b) mFreeBlocks[b->mBDIndex] = b->mNextFreeBlock; removeBlockFromArray(b); DECBLOCKCOUNTER(); ALLOCJR_FREE(b);}
If this block is not empty, make sure it is included in the two-way idle linked list, because you now know that at least one chunk is available in the block. This check is done by comparing mnextfreeblock members with an invalid pointer constant allocjr_fullblock. No matter when the block is full, allocjr_fullblock will be copied to mnextfreeblock. If the check succeeds, you need to link it to the linked list as follows:
// Need to see if block is not in Free List; if not, add it back
If (B-> mnextfreeblock = allocjr_fullblock)
{
B-> mprevfreeblock = NULL;
B-> mnextfreeblock = mfreeblocks [B-> mbdindex];
If (mfreeblocks [B-> mbdindex])
Mfreeblocks [B-> mbdindex]-> mprevfreeblock = B;
Mfreeblocks [B-> mbdindex] = B;
}
At this point, the memory has been recycled and can be reused.
ConclusionTo sum up, I just want to say that I have a lot of fun creating this distributor and I hope you can read it through my description. How interesting it is to write code! The author's blog has more programming tips, code snippets, and software. Original