CMS garbage collection mechanism

Source: Internet
Author: User

Original is not easy, without permission, not reproduced ~ ~ ~

    1. What is a CMS?

      Concurrent Mark Sweep.

      See the name to know, CMS is a concurrency, using the tag-purge algorithm GC.

      CMS is a GC that is recycled for the old age.

    2. What is the use of CMS?

      The CMS is designed to get the minimum pause time.

      In some applications or websites where response time is high, the user program cannot have a long pause and the CMS can be used for this scenario.

    3. How does the CMS execute?

In general, the implementation process of CMS can be divided into the following stages:

3.1 Initial mark (STW)

3.2 Concurrency Token

3.3 Concurrent Pre-cleanup

3.4 Re-marking (STW)

3.5 Concurrent Cleanup

3.6 Reset

3.1 The initial marking phase requires STW.

This stage is an accessibility analysis that marks the object that GC root can directly relate to.

Note the object that is directly associated with the indirect association is marked in the next phase.

The 3.2 concurrency tagging phase is a process that executes concurrently with the user thread.

The phase is GC ROOT tracing, and the thread that was suspended in the first phase starts again.

All reachable objects are marked in this stage, starting with the objects marked in the previous stage.

3.3 The concurrency preprocessing phase does the work or markup, similar to the 3.4 markup feature.

Why do you have to take this step, since it's similar?

As we have said before, CMS is a GC for the purpose of obtaining the shortest pause time.

The re-tagging requires STW (Stop the World), so the re-tagging works as much as possible in the concurrency phase to reduce STW time.

This stage marks objects promoted from the Cenozoic , objects newly assigned to the old age , and objects modified during the concurrency phase .

This stage is more complex, from beginners easy to ignore or do not understand the place to throw a problem everyone thinking :

    • How to determine if the object of the old age is alive?

The answer is simple, the object that can be reached by GC ROOT tracing is alive.

Continue to extend, what to do if the following scenarios exist:


How do you ensure that the current obj is marked as alive when you perform GC in the old age?

( confirm that the new generation of the object is alive also has the same problem, you can think about, the article will give an answer later )

The answer is to scan the Cenozoic to make sure. This is also why the CMS is the GC of the old age, but still have to scan for the Cenozoic reasons . (Note that the initial tag will also scan the Cenozoic)

In the CMS log we can clearly see the scan log:

[Gc[yg occupancy:820 K (6528 K)]

[Rescan (parallel), 0.0024157 secs]

[Weak refs processing, 0.0000143 secs]

[Scrub string table, 0.0000258 secs]

[1 cms-remark:479379k (515960K)] 480200K (522488K), 0.0025249 secs]

[times:user=0.01 sys=0.00, real=0.00 secs]

The rescan phase (a sub-stage of the remark phase) scans for objects in the Cenozoic and older generations. You can see in the log that this phase is identified as rescan (parallel), which indicates that this phase is in parallel.

(see here if you still have a question in mind stating that you are getting started)

The point is: a full-scale scan of the Cenozoic and older generations will not be very slow? sure will.

CMS claims to be the shortest stop time GC, so long pause time is certainly unacceptable.

How to solve it?

You first think.

There must be a mechanism to quickly identify the living objects of the new generation and the old age.

First, the new generation.

You should already know that the next generation of garbage recycling the rest of the objects are all alive, and there are few living objects.

Wouldn't it be a lot better if you had a minor GC before scanning the Cenozoic?

CMS has two parameters:cmsscheduleremarkedensizethreshold,cmsscheduleremarkedenpenetration, the default values are 2M, 50% respectively. Two parameters combined means that after pre-cleaning, Eden Space uses more than 2M to start interruptible concurrent pre-cleanup (Cms-concurrent-abortable-preclean) until the Eden space usage reaches 50% and goes into the remark phase.

If a minor GC can occur during the pre-cleanup phase that can be aborted, it will be all right and peaceful.

Here's a small question, how long does it take to terminate a pre-cleanup to ensure that a minor GC is occurring?

The answer is no guarantee. The reason is simple, because garbage collection is automatically dispatched by the JVM, when the GC we can not control.

But there is always an execution time at this stage, right? Yes.

The CMS provides a parameter cmsmaxabortableprecleantime , which defaults to 5 s.

As long as the 5 s, regardless of the hair did not occur minor GC, there is no cmsscheduleremardedenpenetration will abort this stage, into remark.

What if the minor GC is still not executing within 5S?

The CMS provides the Cmsscavengebeforeremark parameter to force a minor GC before remark.

There are pros and cons to doing so. The good side is to reduce the remark phase of the pause time; the bad side is the minor GC followed by a remark pause. As a result, the pause time is also relatively long.

The CMS logs are as follows:

7688.150: [Cms-concurrent-preclean-start]

7688.186: [cms-concurrent-preclean:0.034/0.035 secs]

7688.186: [Cms-concurrent-abortable-preclean-start]

7688.465: [GC 7688.465: [parnew:1040940k->1464k (1044544K), 0.0165840 secs] 1343593k-> 304365K (2093120K),

0.0167509 secs] 7690.093: [cms-concurrent-abortable-preclean:1.012/1.907 secs] 7690.095: [Gc[yg occupancy:522484 K (1044544 K)]

7690.095: [Rescan (parallel), 0.3665541 secs] 7690.462: [Weak refs processing, 0.0003850 secs] [1 cms-remark:302901k (1048576K)] 825385K (2093120K), 0.3670690 secs]

7688.186 initiates a pre-cleanup that can be terminated, launches the minor GC in the next three seconds, and then enters the remark phase.

In fact, in order to reduce the STW time of the remark phase, the pre-cleanup phase will do as much as possible to reduce the remark pause time.

Remark's rescan stage is multi-threaded, in order to facilitate multi-threaded scanning of the Cenozoic, the pre-cleanup phase will be new generation of blocks .

Multiple objects are stored in each block so that the remark stage does not need to start from scratch to identify the starting position of each object.

The responsibilities of multiple threads are clear, assigning the tiles to multiple threads, and quickly scanning them out.

Unfortunately, this approach is still based on the conditions in which the minor GC has occurred.

If there is no minor gc,top (the next assignable address space), all of the space below is considered a block (this block contains a large part of the Cenozoic content).

This block does not have much effect on the remark phase, so parallel efficiency is also reduced.

OK, the mechanism of the new generation is finished, the following talk about the old age.

The mechanism of the old age is inseparable from a thing called a CARD TABLE , which is actually the array, where each position in the arrays is a byte.

The CMS divides the old-age space into blocks of 512bytes size, and each element in the card table corresponds to a block.

Concurrent tagging, if a reference to an object has changed, the block that contains the object is marked dirty card.

The concurrent pre-cleanup phase will rescan the block and identify the object referenced by the object as accessible.

As an example:

The state of the object when the concurrency token is:


But then the reference to current obj changed:

The block where current obj is located is marked for dirty card.

then to the pre-cleaning stage, remember that one of the tasks in this phase is to mark the objects that were modified during the concurrency tagging phase? Then the objects that become accessible through current obj are also marked, and become the following:


Meanwhile the Dirty Card logo is also cleared.

This is the mechanism of the old age.

but card table has other functions .

Do you remember the question mentioned earlier? How do I identify a minor GC if there is a new generation of references in the old age?

(Studies have shown that in all references, the old generation of this scenario is less than 1%.) cause everyone can analyze it by themselves )

When the old age refers to the new generation, the corresponding card table is identified as the corresponding value (card table is a byte, there are eight bits, the meaning of the contract for each bit to distinguish which is the reference to the new generation, which is the concurrency tag phase modified).

So, the Minor GC can quickly identify the older generation by scanning card table.

Click here, hotspot virtual machine use bytecode interpreter, JIT compiler, write barrier maintenance card table.

When the bytecode interpreter or the JIT compiler updates the reference, the Write Barrier Action card table is triggered.

A little bit more, because of the existence of card table, when the old age space is very large what will happen ? ( Everyone here is free to imagine )

At this point, the pre-cleanup phase of the work is finished.

3.4 tags (STW) pauses all user threads, rescanning objects in the heap, makes accessibility analysis, and marks live objects.

With the previous foundation, the workload of this phase is greatly reduced and so will the pause time.

Note that this stage is multi-threaded.

3.5 Concurrent cleanup. The user thread is reactivated, and the invalid objects are cleaned up.

3.6 Reset. The CMS clears the internal state and prepares for the next recovery.

The implementation process of the CMS is finished, focusing on the operation of the concurrent pre-cleaning and several key parameters of the CMS. You can digest it and have a rest after digestion, because it's not over yet.

4. What is the problem with CMS?

Every coin has the sides------High school English Composition I often use a sentence.

In my opinion, thethree letters of CMS imply the problem . the concurrency + tag-purge algorithm is the source of the problem .

First, concurrency.

4.1 Concurrency means that multiple threads preempt CPU resources, that is, the GC thread and the user thread preempt the CPU. This can result in a decrease in user thread execution efficiency.

The default number of recycle threads for CMS is (number of CPUs +3)/4. This formula means that when the CPU is greater than 4, it is acceptable to ensure that the recycle thread consumes at least 25% of the CPU resources, so that the user thread consumes 75% of the CPU.

But what if there are few CPU resources, such as only two? According to the above formula, the CMS will start 1 GC threads. The equivalent of a GC thread consumes 50% of the CPU resources, which can cause the user program to perform at a sudden decrease in 50%,50% has been significantly reduced.

How do you deal with this scenario?

The answer I gave was that I could not consider this scenario. There are at least a dual-core processor in the PC now, let alone a large server.

The solution for CMS is to provide a incremental mode (incremental).

In this mode, when concurrent tagging, cleanup, the GC thread, the user thread alternately run, to minimize the GC thread exclusive CPU resources time.

This causes the GC to take longer, but has less impact on the user thread.

But the practice proves that CMS performance in this mode is very general, and there is not much optimization.

I-CMS has been declared "deprecated" and is no longer advocated for use.

(Https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html#concurrent_mark_sweep_cms_collector)

4.2 Concurrent cleanup phase the user thread is still running, this period of time may generate new garbage, new garbage in this GC can not be purged, only wait until the next cleanup. These rubbish have a professional noun: floating rubbish .

Because the user thread is still executing during the garbage collection phase, the memory space must be reserved for use by the user thread. As a result, you can't wait until the old age is full again, like any other collector.

The CMS provides the cmsinitiatingoccupancyfraction parameter to set the old age space usage percentage to a percentage of garbage collection.

This parameter defaults to 92%, and the parameter selection needs to look at the specific application scenario.

Setting too small will result in frequent CMS GC, resulting in a large number of pauses; in turn, what happens when you set it too high?

Assuming now set to 99%, 1% of the space left is available.

In the Concurrency cleanup phase, if the user thread needs to use more than 1% space, a concurrent mode failure error is generated, meaning that the concurrency mode fails.

At this point, the virtual machine will start the record: Use the serial old collector to re-recycle older generations. As a result, the pause time becomes longer.

Therefore, the setting of cmsinitiatingoccupancyfraction to specific problems specific analysis.

There are some formulas to set this parameter on the Internet, personally think that is not very rigorous (because the CMS is another problem caused), so do not write out to avoid confusion.

In fact, CMS has a dynamic check mechanism .

Depending on the history, the CMS predicts how long it will take to fill and recycle the old age.

The CMS can automatically perform garbage collection based on its own predictions before the old age space is exhausted.

This feature can be closed using the parameter usecmsinitiatingoccupancyonly .

Here's a question for the reader to think about, and if you design, how do you predict when it will start executing automatically ?

4.3 The first two problems are caused by concurrency, and the next issue is caused by the tag-purge algorithm.

Using the tag-clear algorithm can cause a lot of space fragmentation. Too much space debris can cause problems with large object allocations.

Often there is a lot of space left in the old age, but it is not possible to find enough contiguous space to allocate the current object and have to trigger a full GC.

The solution for CMS is to use the usecmscompactatfullcollection parameter (which is turned on by default) to turn on memory defragmentation when the full GC is not available.

This process needs to be STW, the fragmentation problem is solved, but the pause time is getting longer.

The virtual machine also provides another parameter, cmsfullgcsbeforecompaction, to set how many times the uncompressed full GC is executed, followed by a compressed (default of 0, which is defragmented each time it enters the full GC).

Extend a "foreground collector" thing to everyone, this thing in Java8 also declared as deprecated. (https://bugs.openjdk.java.net/browse/JDK-8027132)

The problem of CMS has been made clear, we digest.

So far, CMS related content has been finished.

To summarize:

CMS uses a variety of ways to minimize GC pause time and reduce user program pauses.

The pause time is reduced while the CPU throughput is sacrificed.

This is a trade-off between pause time and performance, which can be easily understood as "space (performance)" Time.

CMS garbage collection mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.