Dlmalloc is currently a very popular memory allocator, which has been written by Doug Lea (Home page http://gee.cs.oswego.edu/) since 1987, and so far the latest version is 2.8.3 (available from ftp:// G.OSWEGO.EDU/PUB/MISC/MALLOC.C acquisition), because of its high efficiency and other features are widely used (for example, some Linux systems, such as dlmalloc or its deformation, such as Ptmalloc, home page for http:// www.malloc.de/en/index.html) and research (you can search for the keyword "gcspy").
The DLMALLOC implementation has only one source file (and one header file), in about 5000 lines, the annotations take up a lot of space, and since there are so many annotations that appear to be easy to understand, it is true that, in the absence of detail, the general idea is indeed easy to understand (yes, Just to know it), but Dlmalloc as a high-quality masterpiece, the implementation of the use of a lot of skills, in the implementation of the details do not spend a certain amount of energy is no way to understand why they do so, the advantages of doing so, only when the real read after the aftertaste to find it is so wonderful.
lenky0401 personal Blog will be introduced to Dlmalloc's resolution (for Doug Lea malloc's latest version of the 2.8.3, without a description of the 32-bit platform, 8-byte alignment as a hypothetical platform environment settings to consider), due to limited personal level, Therefore, it is not entirely guaranteed that all the understandings of Dlmalloc are correct, but that all content is understood by the individual rather than by deliberate speculation, so if the reader finds anything wrong, please don't take it amiss, if you can. and welcome the letter discussion (lenky0401@163.com).
The description does not contain all the Dlmalloc code, but will try to explain some of the techniques involved, and I believe that friends who are not interested in Dlmalloc source code can learn these independent techniques and use them in their own programming practices.
===========================================================
Dlmalloc the memory into a lot of blocks, and using the so-called boundary tag method to manage the memory, in the dlmalloc of the implementation of the source code to define the two structures malloc_chunk and Malloc_tree_chunk, From their definition, you can see that the first four fields and Malloc_chunk are exactly the same as the Malloc_tree_chunk except for three more fields than Malloc_chunk. These two structures are primarily used to manage memory blocks differently by size.
struct Malloc_chunk {
size_t Prev_foot; /* Size of previous chunk (if free). */
size_t Head; /* Size and inuse bits. */
struct malloc_chunk* fd; /* Double links-Used only if free. */
struct malloc_chunk* BK;
};
typedef struct MALLOC_CHUNK Mchunk;
typedef struct MALLOC_CHUNK* mchunkptr;
typedef struct MALLOC_CHUNK* sbinptr; /* The type of bins of chunks * *
struct Malloc_tree_chunk {
/* The four fields must to compatible with Malloc_chunk * *
size_t Prev_foot;
size_t Head;
struct malloc_tree_chunk* fd;
struct malloc_tree_chunk* BK;
struct malloc_tree_chunk* child[2];
struct malloc_tree_chunk* parent;
bindex_t index;
};
Let's take a look at the case where you only consider using the struct malloc_chunk to manage memory (the memory is divided into small chunks, 256 bytes below the 32-bit machine, and for the structure Malloc_tree_chunk, it manages a chunk, and 32-bit machines are above 256 bytes):
According to the boundary marking method, the structure body malloc_chunk the memory into many blocks through the field head and Prev_foot, and the prev_ in a structure can be seen from the ① shown in the figure. Foot is the information of the previous chunk block of records (in fact the size of the previous chunk block), so we can take advantage of the following macro:
#define PREV_CHUNK (P) (Mchunkptr) ((char*) (p))-(((p)->prev_foot)))
To obtain a MALLOC_CHUNK structure pointer to the previous chunk block.
Pointer FD, BK exists only when the chunk block is idle and is used to add to the free chunk block chain for unified management, and if the chunk block is allocated to the application, the two pointers are not used (the chunk block is removed from the idle chain). So it's also used as an application space, not wasted, as ② shows in the diagram.
The head field records information related to this chunk block, including the size of the chunk block, whether the block is in use, and whether the previous chunk block is in use.
Head a field can store so much information because Dlmalloc always aligns to the address when dividing memory (default is 8 bytes, can be set freely, but 8 bytes is the minimum value and the set value must be a power function value of 2, that is alignment = 2^n,n is an integer and n >=3), so using head to store the chunk block size byte number, the end 3bit bit is always 0, so these three bits can be used to store other information, such as:
Marks the No. 0 bit as a bit, marking whether the previous chunk block is in use, 1 for use, and 0 for idle.
Marks a bit as a 1th bit, marking whether the chunk block is in use, 1 for use, and 0 for idle.
Let's take a look at their respective judgment codes:
#define Size_t_one ((size_t) 1)
#define SIZE_T_TWO ((size_t) 2)
#define PINUSE_BIT (Size_t_one)
#define CINUSE_BIT (Size_t_two)
#define Cinuse (P) (p)->head & Cinuse_bit)
#define Pinuse (P) (p)->head & Pinuse_bit)
For the Prev_foot field mentioned above, it also uses one of its idle bits to mark whether the chunk block is right mmap allocated, similar to this, so don't say more, interested can view the source code.
For structural malloc_tree_chunk, the malloc_chunk of the structure body is exactly the same in the memory partition because the first four fields are exactly the same (in fact, only two fields Prev_foot and head function as boundary markers), Other fields are used for idle chain management.
In this paper, if the dlmalloc is segmented according to the boundary labeling method, the next chapter will continue the management analysis of the two kinds of idle chains.
The previous series mentioned that for the size of the chunk block under 256 bytes is managed through the Malloc_chunk organization, 256 bytes below the chunk block has a total of 256/8=32 class, that is, bytes 8 bytes, 16 bytes, 24 bytes, 32 bytes, ..., 256 bytes, Therefore, Dlmalloc maintains 32 bidirectional circular chain table (and has the chain table head node, the biggest function of the node is to facilitate the unified processing of the nodes in the linked list, that is to simplify programming), each of the free chunk blocks in the chain table is the same size, So when an application needs a byte size (the size of the byte here is considered to occupy space in the chunk header and alignment), that is, if the application calls the function malloc (8), then to dlmalloc this should be larger than 8, and this more detailed question below also, not that I deliberately do not express, It's just a lot of reporting, I can not tell what I really want to say, read the reader to pay attention to their own memory space directly in the corresponding linked list to take it (specific slightly different, that is, if the corresponding linked list does not have free available chunk block, you will also see the next linked list, For example: When an application requests 32 bytes, if the 32-byte list is empty, Dlmalloc also looks for an idle chunk block in a list of 40-byte sizes. , which satisfies the application's memory space request and does not appear to have too much memory fragmentation. We can use the following figure to represent the dlmalloc of free chunk blocks below 256 bytes.
The Dlmalloc program uses a smallbins array to record the 32 bidirectional ring list headers, which are defined within the structure malloc_state, where we first ignore the malloc_state structure and focus only on the Smallbins field, which is defined as follows:
struct Malloc_chunk {
size_t
Prev_foot;
/* Size of previous chunk (if free).
*/
size_t
Head
/* Size and inuse bits. */
struct malloc_chunk* fd;
/* Double links-Used only if free. */
struct malloc_chunk* BK;
};
Mchunkptr
smallbins[(nsmallbins+1) *2];
Among them, Mchunkptr has mentioned in the previous article, for "typedef struct malloc_chunk* mchunkptr;", and the empty nsmallbins is 32, namely "#define Nsmallbins (32U)". So Smallbins is an array of 66 malloc_chunk structure pointer elements, why 66. It's not 32, okay? Here Doug Lea used a trick, if you follow our usual ideas, you might apply 32 arrays of Malloc_chunk structure pointer elements, and then apply a header node (or 32) to the linked list, and then let each pointer element correctly point to the 32 linked list with the head node. In fact, the Prev_foot and head fields in the Malloc_chunk type of linked list "header" have no real effect, so the two bytes are wasted if they are not used properly. Let's take a look at how much memory space an array of 66 malloc_chunk structure pointer elements takes up. The result is 66*4=264 bytes. And the 32 malloc_chunk types of linked list "head node" need how much memory. 32*10=320, is it really 320? No, not just now, Prev_foot and head these two fields do not have any practical effect, so can be reused (overwritten), so the actual need for memory for 32*8=256. 264 is greater than 256, then the 66 malloc_chunk structure pointer element array memory space can store the 32 header nodes, in fact Doug Lea did the same. Let's take a look at this related code:
#define SMALLBIN_AT (M, i)
((Sbinptr) ((char*) & ((M)->smallbins[(i) <<1)))
The sbinptr is also a MALLOC_CHUNK structure pointer type (typedef struct MALLOC_CHUNK* sbinptr; , M represents the previously mentioned structure Malloc_state, and you will understand this technique by carefully understanding the casts in this code. Here's a more intuitive diagram to understand:
Finally, as to why 66 elements of an array instead of 64 or 65, this is a good idea to understand, OK, here we go.
http://lenky0401.cublog.cn