Why CMS uses the "tag-clear" algorithm

Source: Internet
Author: User
Tags compact

In the generational GC, Mark-sweep is commonly used in older generations, or it is mark-sweep/mark-compact mixed, with Mark-sweep in general, and mark-compact when the amount of debris is estimated to reach a certain level. This is because it is traditionally thought that older generations of objects may survive for long periods of time, have high survival rates, or are relatively large, so that copying is not a cost-effective way of using in-place collections. Mark-sweep, Mark-compact, copying the three basic algorithms, only the mark-sweep is not moving objects (that is, do not copy), so choose Mark-sweep.
A brief comparison of three basic algorithms:

Mark-sweep Mark-compact Copying
Speed Medium The slowest The fastest
Space overhead Less (but will accumulate debris) Less (no buildup of debris) Usually requires twice times the size of the live object (without stacking debris)
Move an object? Whether Is Is


About Time Overhead:
The Mark-sweep:mark phase is proportional to the number of live objects, and the sweep phase is proportional to the whole heap size.
The Mark-compact:mark phase is proportional to the number of live objects, and the compact phase is proportional to the size of the living object.
Copying: Proportional to the size of the Living object

If you look at the time-consuming actions of Mark, sweep, compact, and copying, there is a general relationship:
Compaction >= copying > marking > sweeping
Also: marking + sweeping > copying
(although both compactiont and copying involve moving objects, depending on the algorithm, the compact may want to compute the target address of the object once, then fix the pointer and then move the object; copying can do it all together, so it can be faster.)
Also need to pay attention to the cost of GC can not only see collector time, but also to see the allocator side. If you can ensure that the memory is not fragmented, the allocation can be used pointer bumping way, only a pointer to complete the allocation, very fast, and if there is debris in the freelist and other ways to manage, the allocation speed is usually slower. )

In the generational hypothesis, the survival rate of the young generation of objects in minor GC should be very low, so the copying algorithm is the most advantageous, because its time cost is proportional to the size of the living object, if not many live objects, it is very fast; and young Gen itself should be relatively small, Even if it takes twice times more space, it will only waste less space.
While older generations are GC-based, the survival rate of the object may be high, and assuming that the free space available is not too much, so that the copying algorithm is not appropriate, it is more likely to choose another two algorithms, especially the mark-sweep algorithm without moving objects.

But other collectors in the Hotspot VM, except for the CMS, are moving objects, which are either copying or mark-compact variants.

================================================================

In the serial GC (USESERIALGC), parallel GC (USEPARALLELGC) of the HotSpot VM, only the full GC collects the old generation (actually collects the entire GC heap, including the old Generation). It uses the algorithm mark-compact ("mark-compact Old Object Collector" section), specifically a typical single-threaded (serial) Lisp 2 algorithm. Although in the source of the hotspot VM, this full GC implementation class is called MarkSweep, and many of the data is called mark-sweep-compact, but in fact it is typical mark-compact rather than mark-sweep, Please be careful not to confuse it. This is a historical reason, and more than 10 20 years ago the GC terminology was not fixed to a few accepted usages when Mark-sweep-compact and Mark-compact said the same thing.

I'm not quite sure why the hotspot VM chose to implement the full GC first with the MARK-COMPACT algorithm, rather than using Mark-sweep as the basic algorithm to implement Gen 2 GC, as the Microsoft CLR did later. But the truth behind it may not be complicated:
The hotspot VM is the predecessor of the Strongtalk VM, its full GC is also the MARK-COMPACT algorithm, although the specific algorithm is different from the hotspot VM, is a threaded compaction algorithm. This algorithm compared to save space, but also a lot of restrictions, to achieve a relatively detours, so the advent of the hotspot to use a more simple and intuitive Lisp 2 algorithm, and this decision further in the V8 to be reflected.
The Strongtalk VM, formerly the Self VM, also uses the MARK-COMPACT algorithm to implement full GC. You can see that mark-compact is the same strain of this series of VMS and continues to the newer Google V8. Perhaps the original plan for the hotspot VM did not think so much before inheriting the characteristics of its predecessor.

If you want to guess why, then a reasonable inference is: if you can not defragment, long-running programs will eventually encounter memory fragmentation, resulting in memory space waste and memory allocation speed of the problem; To solve this problem, you need to be able to defragment the memory. If you decide to solve the problem of fragmentation, then you can directly choose Mark-compact, or mainly with Mark-sweep Plus with mark-compact to back up. It is obvious that the direct selection of mark-compact is easier to implement. So I chose it.
(The CLR chooses not to eradicate fragmentation.) All the possible problems will eventually go wrong, so there are a lot of them now. NET program is plagued by fragmentation)

Later, the hotspot VM has the parallel old GC (USEPARALLELOLDGC), which is a multithreaded parallel version of the mark-compact algorithm. This algorithm specifically should be called what name I can not say, because there is no specific paper to describe it, and there are many different ways to parallelize the Lisp 2 and other classic mark-compact algorithm, their choice of details are different. In any case, the only concern here is that it uses mark-compact rather than mark-sweep algorithms.

================================================================

Why does the CMS choose Mark-sweep as the basic algorithm to concurrency it, rather than the other GC of the hotspot VM with the algorithm to move the object?

One reason for this is that the design and implementation of the CMS was done on another Sun Jvm,exact VM (EVM). The EVM project later competed with the Hotspot VM and the CMS was migrated from EVM to the hotspot VM. Therefore it does not have the initial descent of the hotspot VM. <-really isn't the reason.

For the real reason please refer to the original thesis of CMS: A generational mostly-concurrent garbage Collector (Oracle Labs link is hung up. Use the CiteSeerX link bar).

The code outside the GC (primarily the logic of the application) is called Mutator, and the GC code is called Collector. It is necessary to keep the synchronization between the two to ensure that the object graph observed by the two is consistent.

If there is a serial, non-concurrent, non-generational, non-incremental collector, then it can always observe the entire object graph when working. So the synchronization between it and Mutator is very simple: the mutator side does not have to do anything special, as long as the GC is required to synchronously call collector, just like normal function calls.

If there is a generational, or incremental, collector, then it will only observe part of the entire object graph when it is working, and the parts it cannot see are likely to be inconsistent with mutator, thus requiring mutator mates: it requires additional synchronization with mutator. Mutator must execute some extra code when changing the reference relationship in the object graph, allowing collector to record these changes. There are two approaches, one is write barrier, and one is read barrier.

//Write Barrier is when you overwrite a reference://Java Codea.x =b//Insert an extra piece of code into://C codeWrite_barrier (A, & (a->x), b); A->x =b; //Read Barrier is when reading a reference://Java Codeb =a.x//Insert an extra piece of code into://C codeRead_barrier (& (a->x)); b= a->x;

Typically, a program reads a reference more frequently than a reference, so it is generally assumed that the cost of the read barrier is much larger than the write barrier, so few GC uses read barrier.
If you only use write barrier, then the "Move object" This action must be completely paused mutator, let collector move the object is good, and then the pointer is corrected, then you can resume mutator execution. This means that the collector "moving object" cannot be performed concurrently with mutator.

If read barrier is used (though rare but not absent, such as Azul C4 Collector), the moving object can be single-individual and does not require immediate correction of all pointers, so it can be thought that the entire process Collector is concurrent with mutator.

The CMS does not use read barrier, only the write barrier. Thus, if it chooses mark-compact as the basic algorithm, only the mark stage can be executed concurrently (where the root scanning stage still needs to pause mutator, which is initial marking; Marking can be executed concurrently with mutator), and then the entire compact phase is suspended mutator. Recall that the time spent in the compact phase is proportional to the size of the living object, which is not a good deal for older generations.
So choosing Mark-sweep as the basic algorithm is a very reasonable choice: mark and sweep phase can be executed concurrently with mutator. The sweep stage does not need to be corrected because it does not move the object, so do not pause mutator.

(digression: But in reality we can still see the incremental/concurrent old generation GC based on the MARK-COMPACT algorithm.) For example, the old generation GC in Google V8 can split the marking phase into non-concurrent initial marking and incremental incremental marking, but the real time-consuming compact phase still requires a full pause mutator. It has to reduce the pause time only to find a way to further select one part of the old age to do compaction, rather than the whole old generation compaction in a breath. This has also been achieved in the V8, called incremental compaction. To continue in this direction will eventually become region-based collector, it is similar to G1. )

What if the fragments pile up? The CMS in the HotSpot VM is only responsible for the concurrent collection of older generations (not the entire GC heap). If the space reclaimed by the concurrent collection does not match the allocated requirements, it is rolled back to the mark-compact algorithm using the serial GC as the full GC. That is Mark-sweep, Mark-compact is the classic configuration for backup. But this configuration also buried the hidden danger: The use of CMS must be very careful tuning, as far as possible to delay the fragmentation of the full GC caused by the occurrence. Once the full GC is in place, the pause time can be very long, so the advantage of choosing a CMS for low latency is gone.

So the new Garbage-first (G1) GC goes back to the copying-based algorithm, dividing the entire GC heap into small areas (region), and controlling the pause time of moving objects by selecting only a small number of regions per GC. This allows for low latency and is not affected by fragmentation.
(Note: Although G1 has concurrent global marking, it is optional and the actual pause time is still based on the copying algorithm rather than the mark-compact)

Why CMS uses the "tag-clear" algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.