Three basic ways of garbage collection (GC)

Source: Internet
Author: User

Garbage (garbage) is the object that the program needs to recycle, if an object is not directly or indirectly referenced, then this object becomes "garbage", it takes up the memory needs to be released in time, otherwise it will cause "memory leak". Some languages require programmers to manually free up memory (garbage collection), and some languages have a garbage collection mechanism (GC). This article discusses three basic ways to implement GC implementations.

In fact, these three ways can also be broadly categorized into two categories: tracking collection, reference counting. The "Unified Theory of Garbage Collection", released by IBM's Watson Research Center, F.bacon, describes a theory: any idea of garbage collection, nothing more than the combination of the two, one of the improvements and progress, inevitably accompanied by another improvement and progress.

I. Follow-up recycling

The way to track recycling is independent of the program and runs regularly to check for garbage, which takes a long time to break.

Two. Mark Clear

The way the tag is cleared requires two scans of the object of the program, first scanning from the root (root), the object referenced by the root is marked as not garbage, and the object referenced by the object is also marked as not garbage, recursively. All references to objects that are not garbage are scanned. For a second scan, the object that was not tagged in the first scan was garbage, and it was recycled.

Three. Copy Collection

The way that replication is collected requires only one scan of the object. Prepare a "new space", start with the root, sweep the object, and if there is a reference to the object, copy it to "New space". After a scan is complete, all objects that exist in the "new space" are all non-garbage objects.

These two ways are different, the way to mark clear save memory But two scans need more time, for the case of small garbage ratio is dominant. Replication collection is faster but requires an extra piece of memory to replicate, which is advantageous for large garbage ratios. In particular, replication collects the advantages of "locality".

During copy collection, the objects are copied into the new space in the order in which they are referenced. As a result, the likelihood of a closer object being placed in a more distant memory space is increased, which is called locality. In the case of high locality, the memory cache will work more efficiently and the performance of the program will improve.

For tag cleanup, there is a derivation algorithm for the tag-compression algorithm:

For the compression phase, its job is to move all the objects that are accessible to the same area of the heap memory, so that they are arranged together in a compact manner, so that the free memory freed by all non-attainable objects is concentrated together in such a way that the purpose of reducing memory fragmentation is achieved.

Four. Reference counting

Reference counting means that, for each object, a reference count for that object is saved, and the reference count for that object increases. If the object's reference count is zero, the object is recycled.

Pros: The greatest advantage of reference counting is that it is easy to implement, and C + + programmers should have implemented similar mechanisms. Second, the cost is small, basically the reference count is 0 when the garbage will be recovered immediately, and other methods difficult to predict the life cycle of the object, the garbage will exist more time than this method. In addition, this garbage collection method produces the shortest interruption time.

Cons: The most famous drawback is that if there is a circular reference in the object, it cannot be recycled. For example, the following three objects refer to each other, but there is no reference from the root (root), so it is already garbage. But the reference count is not 0.

Another drawback is that the reference count is not suitable for use in parallel, and multiple threads manipulating reference counts can cause problems with different values and cause memory errors. So the reference count must be exclusive, and if the reference operation is frequent, then the overhead of concurrency control mechanisms such as locking is quite large.

Perl and Python Use this GC mechanism.

Their derived algorithms

Generational recycling

This method of recycling uses one of the characteristics of the program: most objects become garbage from the beginning of the generation, and the objects that exist for a long time tend to have a longer life cycle. The high frequency recycles the newly generated objects, called "small collections," and the low frequency recycles all objects, called "large collections." After each "small recycling", the surviving objects are classified as "Laosheng", "small recycling", encountered the Laosheng generation directly skip. Most generational recycling algorithms use a "copy-collect" approach because of the large proportion of garbage in small collections.

There is a problem with this approach: if in a new generation of objects, there is a "Laosheng generation" object to its reference, it is not garbage, then how to stop "small recycling" to its recycling it? Here's a way to write a barrier.

The program protects all areas involved in modifying the object's content, known as the Write Barrier. The write barrier is not only used for generational recycling, but also for other GC algorithms.

The performance of this algorithm is to use a recordset to record references from the Cenozoic to the Laosheng generation. If there are two objects A and B, when modifying the object contents of a and adding a reference to B, if ①a is "Laosheng generation"②b "is" Cenozoic ". The reference is added to the recordset. "Small recycling", because the record set has a reference to B, so B is no longer garbage.

Incremental recovery

The above algorithm shortens the average interrupt time for "GC", but in a program with high real-time requirements, the "GC maximum interrupt Time" is higher. For example, autopilot software, if a GC interrupts 0.1s, then the loss can be fatal.

Incremental recycling is the execution of a GC into several parts. Set the conditional limit of "GC up to interrupt 10ms" to make the GC's terminal time as predictable.

However, the reference relationship may have changed between two sections of the GC program. Therefore, this GC algorithm also writes the barrier to record the change of the reference relationship. Although this method controls the maximum interrupt time, the total GC time is increased due to the increased number of interrupts.

Parallel recycling

The rationale is to perform GC work while the program is running to maximize CPU performance. But this approach also has to face the problem of incremental recovery, so also to write barrier operation.

However, this approach does not completely halt the operation of the original program, in some specific GC phase or to suspend the original program. Today, with the rapid development of multicore, this algorithm is constantly being optimized. The field of parallel recovery of uninterrupted original program is quite worth expecting.

Three basic ways of garbage collection (GC)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.