"Reprint" GC basic algorithm and C++GC mechanism

Source: Internet
Author: User

Original:

GC Basic algorithm and C++GC mechanism

Read Catalogue

    • Objective
    • Basic concepts
      • Forward-to-reach graphs and root sets
    • Three basic garbage collection algorithms and their improved algorithms
      • 1. Reference counting algorithm
      • 2, Mark & Sweep algorithm
      • 3. Node Replication algorithm
      • Generational recycling
    • C + + garbage collection mechanism
      • Reference books

Body

Back to the top of the preface

The garbage collector is a dynamic storage allocator that automatically frees the allocated blocks that the program no longer needs, which is also known as garbage . In the programmer's view, garbage is an object that is no longer referenced. The process of automatically recycling garbage is called garbage collection (garbage collection). In a language that supports garbage collection, programs explicitly request memory, but they never need to be explicitly freed. The garbage collector periodically identifies the garbage block and puts the garbage block back into the idle list. Obviously, the C-language malloc package is not a GC-capable allocator, and the programmer explicitly calls malloc to allocate memory, and it needs to explicitly call free to release it. These languages, such as Java and C #, provide a garbage collector. The content of this article is to introduce some common GC algorithms, and briefly mention the GC mechanism of C + +.

Back to top basic concepts

Forward-to-reach graphs and root sets

The garbage collector treats the memory as a forward-to-reach graph. The nodes in the diagram can be divided into two groups: a group called the root node , which corresponds to a location that is not in the heap, which can be a register, a variable in the stack, or a global variable in the read-write data region of the virtual memory, and a group called a heap node that corresponds to an allocation block in the heap, such as:

When there is a root node that can reach a heap node, we call the heap node reachable and, conversely, unreachable. Cannot reach the heap node as garbage. The goal of visible garbage collection is to look for unreferenced heap nodes from the root set and release them.

Back to top three basic garbage collection algorithms and their improved algorithms

Garbage collection algorithm is an important and active research area, since the the 1960s began to study garbage collection, garbage algorithm research has never stopped. There are several types of common garbage collection algorithms:

1. Reference counting algorithm

The reference technology algorithm is the only GC algorithm that does not use the concept of a root set. The basic idea is to add a counter to each object that records the number of references to that object. Each time a new reference is directed to the object, the counter adds one; Conversely, if a reference to the object is empty or points to another object, the counter is reduced by one. When the value of the counter is 0 o'clock, the object is automatically deleted. This idea can be referred to C + + reference counting technology and the simple implementation of smart pointers.

The advantage of the reference counting algorithm is that it is simple to implement and can be easily implemented in native languages that do not support GC. Another advantage this garbage collection mechanism is instant recovery, that is, the instant the object is no longer referenced, is immediately released. The disadvantage is that if there are circular references to objects, these objects cannot be freed, as shown in the example:

Disadvantage two is that when multiple threads increase or decrease the reference count at the same time, the value of the reference count may produce inconsistent problems, and the concurrency control mechanism must be used to solve the problem, which is also a small overhead.

2, Mark & Sweep algorithm

This algorithm, also known as the Mark clearing algorithm, is McCarthy original. It is also currently recognized as the most effective GC scheme. The Mark&sweep garbage collector consists of the tagging phase and the recovery phase, which marks the root node for all of the nodes that are accessible to the node, and the purge phase releases each unmarked allocated block. Typically, one of the idle lows in the block head is used to indicate whether the block has been marked. When the memory is applied dynamically by the Mark&sweep algorithm, the memory is allocated on demand, and when the memory is not enough to allocate, from the Register or the reference on the program stack, traverse the above to reach the graph and mark (mark the stage), and then traverse the memory space once more. Frees all unmarked objects (purge phase). Therefore, the garbage collection needs to interrupt the normal program, when the program involves large memory, the number of objects when the interruption process may be a bit long. Of course, the collector can also constantly update the graph and recycle garbage as a separate thread. The algorithm does not like the reference count for instant recovery of memory, but it solves the circular reference problem of reference counting, so some languages combine the reference counting algorithm with the Mark & Sweep algorithm to form a GC mechanism.

3. Node Replication algorithm

The disadvantage of the Mark & sweep algorithm is that when a large number of objects are allocated, and most objects need to be recycled, the recovery interruption process can be very expensive. The node-copy algorithm, on the other hand, is just the opposite, when the more objects that need to be recycled, the more expensive it is, and the more expensive it is when most objects do not need to be recycled.
The basic idea of the algorithm is this: from the root node, the referenced object will be copied to a new storage area, and the remaining objects are no longer referenced, that is, garbage, left in the original storage area. When releasing memory, simply release the original storage area and continue maintaining the new storage area. Process

As you can see, many objects need to be copied to the new storage area when the referenced object (not a garbage object) is a lot.

Generational recycling

The above three kinds of basic algorithms each have their own advantages and disadvantages, but also have many improved programs. Through the integration of these three ways, there are some more advanced ways. And the most important one of the advanced GC technology is generational recycling. Its basic idea is this: there are a lot of such objects in the program, they will be released soon after the allocation, but if an object is allocated for a long period of time is not recycled, it is very likely that its life cycle is very long, trying to collect it is useless. In order for the GC to become more efficient, we should focus on scanning the objects that are just born, so that we can reclaim most of the rubbish. In order to achieve this goal, we need to base on the "age" of the object to be divided into generations, and the objects that exist for a long time are divided into Laosheng generations, which can be divided into multiple generations according to the different implementation ways.

A recycling implementation strategy can be: first from the root of a regular scan, scanning process if you encounter Laosheng generation object is not recursive scan, this can greatly reduce the number of scans. This process can use either the markup cleanup algorithm or the Copy collection algorithm. Then, after scanning the residual objects into the Laosheng generation, if using the tag clearing algorithm, you should set a flag on the object to flag its age; If you use Copy collection, you only need to set the object in the new storage area to Laosheng generation. On the actual implementation, the scheme of generational recovery algorithm is very different, and it often integrates several basic algorithms.

The number of other improved algorithms is very large, but mostly based on the above three basic algorithms.

Back to top C + + garbage collection mechanism

The C language itself does not provide a GC mechanism, while C + + 0x provides a smart pointer based on the reference counting algorithm for memory management. There are also garbage collection libraries that are not standard for C + +, such as the famous Boehm library. Other algorithms can also implement the GC mechanism of C/C + +, such as the previously mentioned markup cleanup algorithm.

When an application uses malloc to attempt to get a block of memory from the heap, it usually calls malloc in the usual way, and when malloc cannot find a suitable free block, it calls the garbage collector to reclaim the garbage to the free list. At this point, the garbage collector identifies the garbage blocks and returns them to the heap through the free function. In this way, the garbage collector calls the free function instead of us, so that we can explicitly assign it without explicitly freeing it.

The garbage collector in is a conservative garbage collector. The conservative definition is that each of the blocks can be correctly marked as unreachable, while some unreachable blocks may be incorrectly marked as accessible. The root cause is that the C + + language does not use any type of information to mark the location of the memory, that is, for an integer type, the language itself does not have an explicit way of determining whether it is an integer or a pointer. Therefore, if an integer value represents the address of a word in an unreachable block, the unreachable block is marked as unreachable. As a result, the garbage collector implemented by C + + is not accurate, and there is a phenomenon of recycling dirty. The garbage collector, like Java, is precisely recycled. In the "Lectures on garbage collectors in C + + 0x" article, the C + + standard proposal uses keywords such as gc_strict and Gc_relax to describe whether there are pointers in a memory area, but not on each data. In fact, as early as 07, a C + + standard proposal N2670 proposed that the garbage collection mechanism should be added to C + +, the final proposal was not adopted, the reason is probably because of the complexity of the language itself, due to the existence of such limitations. So in C + + 0x, in addition to Shard_ptr, weak_ptr these smart pointers, we do not see the GC mechanism of the figure. As for how C + + is solving the circular reference problem of reference counting and concurrency control, we will introduce another article.

Finish

"Reprint" GC basic algorithm and C++GC mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.