Compile a simple Garbage Collector in C Language

Source: Internet
Author: User

People seem to think that it is difficult to write a garbage collection mechanism. It is a deep magic that only a few wise men and Hans Boehm (et al) can understand. I think the most difficult part of garbage collection writing is memory allocation, which is quite difficult to read the malloc sample written by K & R.

Before we start, we need to explain some important things: first, the code we write is based on Linux kernel. Note that it is Linux kernel instead of GNU/Linux. Second, our code is 32bit. Third, do not directly use the code. I don't guarantee that these codes are completely correct. Some of them may be bugs that I haven't found yet, but the overall thinking is still correct. Okay. Let's get started.

If you see any error, please email me [email protected]

Write malloc

At the beginning, we need to write a memory distributor (memmory Allocator), which can also be called the memory allocation function (malloc function ). The simplest way to implement memory allocation is to maintain a linked list composed of idle memory blocks which are divided or allocated as needed. When a user requests a piece of memory, a suitable size of memory block will be removed from the linked list and allocated to the user. If no suitable idle memory block exists in the linked list, and a larger idle memory block has been split into smaller memory blocks or the kernel is asking for more memory: it means that the idle memory blocks in the linked list are too small to be allocated to users ). At this time, a piece of memory will be released and added to the idle block linked list.

Each idle memory block in the linked list has a header to describe the information of the memory block. Our header contains two parts. The first part indicates the size of the memory block, and the second part points to the next idle memory block.

typedef struct header{    unsigned int size;    struct block  *next;} header_t;



Embedding headers in memory blocks is the only sensible practice, and it is important to enjoy the benefits of automatic byte alignment.

Because we need to track both our "currently used memory blocks" and "unused memory blocks", apart from maintaining the linked list of idle memory, we also need a linked list to maintain the currently used memory blocks (for convenience, the two linked lists are respectively written as "Idle block linked list" and "used block Linked List "). The memory blocks we remove from the idle block linked list will be added to the used block linked list, and vice versa.

Now we are almost ready to complete the first step of malloc implementation. But before that, we need to know how to apply for memory from the kernel.

The dynamically allocated memory will reside in a place called heap. The heap is between stacks) and BSS (uninitialized data segment-all your global variables are stored here and the default value is 0. Heap memory address starts from the boundary of the (low address) BSS segment, end with a separator address (this separator address is the separation line between the memory that has been mapped and the memory that has not been mapped ). To get more memory from the kernel, we only need to increase the separator address. To increase the separator address, we need to call a system call of a UNIX system called sbrk. This function can increase the separator address based on the parameters we provide, if the function is successfully executed, the previous separator address is returned. If the function fails,-1 is returned.

With our knowledge, we can create two functions: morecore () and add_to_free_list (). When the idle block Linked List lacks memory blocks, we call the morecore () function to apply for more memory. Because it is expensive to apply for memory from the kernel each time, we apply for memory in the unit of page size. The page size is not very important here, but there is a simple explanation: the page is the minimum memory unit mapped from the virtual memory to the physical memory. Next we can use add_to_list () to add the applied memory block to the idle block linked list.

/* * Scan the free list and look for a place to put the block. Basically, we‘re * looking for any block the to be freed block might have been partitioned from. */static voidadd_to_free_list(header_t *bp){    header_t *p;     for(p = freep; !(bp > p && bp < p->next); p = p->next)        if(p >= p->next && (bp > p || bp < p->next))            break;     if(bp + bp->size == p->next) {        bp->size += p->next->size;        bp->next = p->next->next;    }else        bp->next = p->next;     if(p + p->size == bp) {        p->size += bp->size;        p->next = bp->next;    }else        p->next = bp;     freep = p;} #define MIN_ALLOC_SIZE 4096 /* We allocate blocks in page sized chunks. */ /* * Request more memory from the kernel. */static header_t *morecore(size_t num_units){    void *vp;    header_t *up;     if(num_units < MIN_ALLOC_SIZE)        num_units = MIN_ALLOC_SIZE /sizeof(header_t);     if((vp = sbrk(num_units *sizeof(header_t))) == (void*) -1)        return NULL;                 up = (header_t *) vp;    up->size = num_units;    add_to_free_list (up);    return freep;}




Now we have two powerful functions. Next we can write the malloc function directly. When the first memory block meeting the requirements (the memory block is larger than the required memory) is used to scan the idle block linked list, the scan is stopped, instead of scanning the entire linked list to find the most suitable memory block, the algorithm we adopt is actually the first adaptation (compared with the best adaptation ).

Note: One thing to note: In the memory block header structure, the unit of the size is block rather than byte.

static header_t base;/* Zero sized block to get us started. */static header_t *usedp, *freep; /* * Find a chunk from the free list and put it in the used list. */void*GC_malloc(size_t alloc_size){    size_t num_units;    header_t *p, *prevp;     num_units = (alloc_size +sizeof(header_t) - 1) /sizeof(header_t) + 1;     prevp = freep;     for(p = prevp->next;; prevp = p, p = p->next) {        if(p->size >= num_units) {/* Big enough. */            if(p->size == num_units)/* Exact size. */                prevp->next = p->next;            else{                p->size -= num_units;                p += p->size;                p->size = num_units;            }             freep = prevp;                         /* Add to p to the used list. */            if(usedp == NULL)                 usedp = p->next = p;            else{                p->next = usedp->next;                usedp->next = p;            }             return(void*) (p + 1);        }        if(p == freep) {/* Not enough memory. */            p = morecore(num_units);            if(p == NULL)/* Request for more memory failed. */                return NULL;        }    }}



The success of this function depends on whether freep = & base is used for the first time. This will be set in the initialization function.

Although our code does not take into account memory fragments at all, it can work. Now that it can work, we can start the next interesting part-garbage collection!

Marking and cleaning

We have said that the garbage collector will be very simple, so we try to use the simple method: marking and clearing. This algorithm is divided into two parts:

First, we need to scan the memory space of all variables that may point to heap data and check whether these variables point to heap data. To achieve this, we can traverse the memory blocks in the used block linked list for data blocks with every word-size in the memory space. If the data block points to a memory block in the used linked list block, we mark this memory block.

The second part is that after scanning all possible memory space, we traverse the used block linked list to move all unmarked memory blocks to the idle block linked list.

Many people now think that it is not feasible to compile a simple function similar to malloc to implement garbage collection in C, because we cannot obtain a lot of information outside the function. For example, no function in C can return the hash ing of all variables allocated to the stack. But as long as we are aware of two important facts, we can bypass these things:

First, in C, you can try to access any memory address you want to access. It is impossible for a data block compiler to access but its address cannot be expressed as an integer that can be assigned to the pointer. If a program in C is used, it can be accessed by the program. This is a confusing concept for programmers who are not familiar with C, because many programming languages limit program access to virtual memory, but C does not.

Second, all variables are stored somewhere in the memory. This means that if we can know the common storage location of variables, We can traverse these memory locations to find all possible values for each variable. In addition, because memory access is usually aligned with words (Word-size), we only need to traverse each word in the memory area.

Local variables can also be stored in registers, but we do not need to worry about these because registers are often used to store local variables, and when functions are called, they are usually stored in the stack.

Now we have a strategy for marking the phase: traverse a series of memory areas and check whether memory may point to the used block linked list. Writing such a function is very concise and clear:

#define UNTAG(p) (((unsigned int) (p)) & 0xfffffffc) /* * Scan a region of memory and mark any items in the used list appropriately. * Both arguments should be word aligned. */static voidmark_from_region(unsigned int *sp, unsigned int *end){    header_t *bp;     for(; sp < end; sp++) {        unsigned int v = *sp;        bp = usedp;        do{            if(bp + 1 <= v &&                bp + 1 + bp->size > v) {                    bp->next = ((unsigned int) bp->next) | 1;                    break;            }        }while((bp = UNTAG(bp->next)) != usedp);    }}



To ensure that we only use two words in the header, we use a technology called tagged pointer. Using the next pointer pointing to the address in the header, which is always word aligned, we can conclude that several valid bits at the low position of the pointer will always be 0. Therefore, we mark the cursor bit of the next pointer to indicate whether the current block is marked.

Now we can scan the memory area, but what memory areas should we scan? We want to scan the following:

  1. BBS (uninitialized data segment) and initialized data segment. It contains global and local variables of the program. Because they may apply something in the heap, we need to scan the BSS and initialize the data segment.
  2. Used data blocks. Of course, if the user assigns a pointer to point to another allocated memory block, we will not want to release the allocated memory block.
  3. Stack. Because the stack contains all the local variables, this can be said to be the region that needs to be scanned most.

We have learned everything about heap, so writing a mark_from_heap function is very simple:

/* * Scan the marked blocks for references to other unmarked blocks. */static voidmark_from_heap(void){    unsigned int *vp;    header_t *bp, *up;     for(bp = UNTAG(usedp->next); bp != usedp; bp = UNTAG(bp->next)) {        if(!((unsigned int)bp->next & 1))            continue;        for(vp = (unsigned int*)(bp + 1);             vp < (bp + bp->size + 1);             vp++) {            unsigned int v = *vp;            up = UNTAG(bp->next);            do{                if(up != bp &&                    up + 1 <= v &&                    up + 1 + up->size > v) {                    up->next = ((unsigned int) up->next) | 1;                    break;                }            }while((up = UNTAG(up->next)) != bp);        }    }}



Fortunately, most modern Unix connectors can export etext and end symbols for BSS segments and initialized data segments. The address of the etext symbol is the starting point of the initialization data segment (the last address past the text segment, which contains the machine code of the Program), and the end symbol is the starting point of the heap. Therefore, the BSS and initialized data segments are located between & etext and & End. This method is simple enough when it is not platform independent.

Stack is a little difficult. The stack top of the stack is very easy to find. You only need to use a little inline assembly, because it is stored in the SP register. However, we will use the BP register because it ignores some local variables.

Looking for the stack bottom (the starting point of the stack) involves some tips. For security reasons, the kernel tends to randomize the starting point of the stack, so it is difficult for us to get an address. Honestly, I am not an expert in looking for the stack, but I have some ideas to help you find an accurate address. One possible method is to scan the call stack to find the Env pointer, which will be passed to the main program as a parameter. Another way is to read each larger subsequent address from the top of the stack and process inexorible SIGSEGV. However, we do not intend to use either of these two methods. We will use Linux to put the bottom of the stack into a string and present it in the proc directory to indicate the fact in the file of the process. It sounds silly and indirect. Fortunately, I don't feel like this is funny because it is exactly the same method as the method used to find the stack bottom in Boehm GC.

Now we can write a simple initialization function. In the function, open the proc file and find the stack bottom. The bottom of the stack is the 28th values in the file, so we ignore the first 27 values. The difference between Boehm GC and ours is that it only uses system calls to read files to avoid using heap for the stdlib library, but we don't care about this.

/* * Find the absolute bottom of the stack and set stuff up. */voidGC_init(void){    static int initted;    FILE *statfp;     if(initted)        return;     initted = 1;     statfp =fopen("/proc/self/stat","r");    assert(statfp != NULL);    fscanf(statfp,           "%*d %*s %*c %*d %*d %*d %*d %*d %*u "           "%*lu %*lu %*lu %*lu %*lu %*lu %*ld %*ld "           "%*ld %*ld %*ld %*ld %*llu %*lu %*ld "           "%*lu %*lu %*lu %lu", &stack_bottom);    fclose(statfp);     usedp = NULL;    base.next = freep = &base;    base.size = 0;



Now we know the location of each memory area that we need to scan, so we can finally compile the recycle function that shows the call:

/* * Mark blocks of memory in use and free the ones not in use. */voidGC_collect(void){    header_t *p, *prevp, *tp;    unsigned long stack_top;    extern char end, etext;/* Provided by the linker. */     if(usedp == NULL)        return;         /* Scan the BSS and initialized data segments. */    mark_from_region(&etext, &end);     /* Scan the stack. */    asm volatile("movl    %%ebp, %0":"=r"(stack_top));    mark_from_region(stack_top, stack_bottom);     /* Mark from the heap. */    mark_from_heap();     /* And now we collect! */    for(prevp = usedp, p = UNTAG(usedp->next);; prevp = p, p = UNTAG(p->next)) {    next_chunk:        if(!((unsigned int)p->next & 1)) {            /*             * The chunk hasn‘t been marked. Thus, it must be set free.             */            tp = p;            p = UNTAG(p->next);            add_to_free_list(tp);             if(usedp == tp) {                usedp = NULL;                break;            }             prevp->next = (unsigned int)p | ((unsigned int) prevp->next & 1);            goto next_chunk;        }        p->next = ((unsigned int) p->next) & ~1;        if(p == usedp)            break;    }}




Friends, everything is here, a garbage collector written in C for C. The code itself is not complete. It also requires some fine-tuning to make it work normally, but most of the code can work independently.

Summary

From elementary school to high school, I have been learning and playing drums. Every Wednesday at around, I will have a great teacher playing drums.

Whenever I learn some new groove or beats, my teacher will always give me the same warning: I try to do everything at the same time. I looked at the music score. I simply tried to play it out with both hands, but I couldn't. The reason is that I still don't know how to break the slot, but I am learning other things at the same time rather than simply practicing it.

So my teacher taught me how to learn: Don't think about how to do everything at the same time. Learn how to use your right hand to play the drum. When you learn it, then learn how to use your left hand to play the drum. Learn the bass, tambourine, and other parts in the same way. After you can use each part separately, you can start to practice them at the same time. First, you can start two exercises at the same time, and then three exercises. Finally, you can complete all the parts at the same time.

I have never been good enough to beat the drums, but I always remember the lessons of this course during programming. It is very difficult to plan a complete program from the very beginning. The only algorithm you program is to divide and conquer it. First, write the memory allocation function, then write the memory query function, and then clear the memory function. Finally, combine them.

When you overcome this obstacle in programming, there will be no more difficult practices. You may have an algorithm that you don't know much about, but anyone who has enough time will certainly be able to understand this algorithm through papers or books. If a project looks daunting, it is divided into several completely independent parts. You may not know how to write an interpreter, but you can definitely write a analyzer. Then, let's take a look at what else you need to add.

Compile a simple Garbage Collector in C Language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.