In the CMS GC, using the parameters -xx:+printgcdetails and -xx:+printgctimestamps will output a lot of log information, to understand this information can help us better adjust parameters for higher performance.
Let's take a look at the CMS GC log example in JDK1.4.2_10:
39.910: [GC 39.910: [parnew:261760k->0k (261952K), 0.2314667 secs] 262017k->26386k (1048384K), 0.2318679 secs]
Cenozoic uses (parnew parallel) collector. The Cenozoic capacity of 261952K,GC recovered from 261760K to 0, taking 0.2314667 seconds. (Note: 262017k->26386k (1048384K), 0.2318679 secs means the entire heap occupies from 262017K to 26386K, time consuming 0.2318679)
40.146: [GC [1 cms-initial-mark:26386k (786432K)] 26404K (1048384K), 0.0074495 secs]
Start using the CMS collector to recycle the old age. The initial tag (Cms-initial-mark) stage, which marks the object that the root can reach directly, marks the time the entire application thread pauses.
The old age capacity is triggered when the 786432k,cms collector reaches 26386K of space
40.154: [Cms-concurrent-mark-start]
Begins the concurrency tagging (concurrent-mark-start) phase, in which the first phase of the suspended thread restarts, starting with the objects marked by the previous stage, and all reachable objects are marked in this stage.
40.683: [cms-concurrent-mark:0.521/0.529 secs]
End of concurrent tagging phase, 0.521 seconds CPU time, 0.529 second wall clock time (also contains the time that the thread has given the CPU to execute on other threads)
40.683: [Cms-concurrent-preclean-start]
Start the pre-cleanup phase
Pre-cleanup is also a phase of concurrent execution. In this phase, you will find the objects that were promoted or newly assigned or updated from the new generation during the previous stage of execution. By re-scanning these objects concurrently, the pre-cleanup phase reduces the amount of work for the next Stop-the-world re-tagging phase.
40.701: [cms-concurrent-preclean:0.017/0.018 secs]
The pre-cleanup phase takes 0.017 seconds of CPU time and 0.018 seconds for wall clocks.
40.704: [GC40.704: [Rescan (parallel), 0.1790103 secs]40.883: [Weak refs processing, 0.0100966 secs] [1 cms-remark:2 6386K (786432K)] 52644K (1048384K), 0.1897792 secs]
The Stop-the-world phase, starting with the root and its referenced objects, re-scans the remaining updated objects in the CMS heap. It takes 0.1790103 seconds for the rescan to take 0.0100966 seconds to process the weak reference object and 0.1897792 seconds for this phase.
40.894: [Cms-concurrent-sweep-start]
The Concurrency cleanup phase begins, and the application thread is still running during the cleanup phase.
41.020: [cms-concurrent-sweep:0.126/0.126 secs]
Concurrent cleanup phase takes 0.126 seconds
41.020: [Cms-concurrent-reset-start]
Start concurrent Reset
41.147: [cms-concurrent-reset:0.127/0.127 secs]
At this stage, the CMS internal data structure is reinitialized for the next round of GC use. This phase takes 0.127 seconds.
This is the log of the CMS normal run cycle printing, now let's take a look at the other CMS log records:
197.976: [GC 197.976: [parnew:260872k->260872k (261952K), 0.0000688 secs]197.976: [CMS197.981: [ cms-concurrent-sweep:0.516/0.531 secs]
(Concurrent mode failure): 402978k->248977k (786432K), 2.3728734 secs] 663850k->248977k (1048384K), 2.3733725 secs ]
This message shows that the Parnew collector was requested for a new generation of recycling, but the collector did not attempt to recycle because it expected that in the worst case, there was not enough space in the old CMS to accommodate the new generation of surviving objects. We call this failure "a complete promotion guarantee failure".
Because of this, the CMS in the concurrency mode is interrupted and at 197.981 seconds, the full GC is started. This full GC, using the mark-erase-cleanup algorithm, will occur Stop-the-world, time consuming 2.3733725 seconds. The old age of CMS occupies from 402978K to 248977K.
Avoid concurrent mode failure by increasing the size of the older generation space or setting the parameter cmsinitiatingoccupancyfraction and setting Usecmsinitiatingoccupancyonly to True. The value of the parameter cmsinitiatingoccupancyfraction must be chosen carefully and setting too low can cause frequent CMS recoveries.
Sometimes we find that when there is a promotion failure in the log, there is plenty of room in the old age. This is because "fragments", the old age of free space is not continuous, and from the new generation of the promotion of the object, need a contiguous free space. A CMS collector is a non-compression collector that can fragment in some type of application. In the following blog, Jon discusses in detail how to deal with fragmentation issues: https://blogs.oracle.com/jonthecollector/entry/when_the_sum_of_the
Starting with JDK 1.5, there have been some changes in the promotion assurance check policy in the CMS collector. The original strategy was to consider the worst case scenario, in which all the new generation objects were promoted to the old age, and the newly promoted security check strategy was based on the recent promotion history, which was expected to be much less than the worst-case promotion, and therefore required less space. If the promotion fails, the Cenozoic is in a consistent state. Triggers a Stop-the-world tag-compression collection. If you want to get this functionality in USESERIALGC, you need to set the parameter-xx:+handlepromotionfailure.
283.736: [Full GC 283.736: [parnew:261599k->261599k (261952K), 0.0000615 secs] 826554k->826554k (1048384K), 0.0003259 secs]
GC locker:trying a full collection because scavenge failed
283.736: [Full GC 283.736: [parnew:261599k->261599k (261952K), 0.0000288 secs]
Stop-the-world GC occurs when a JNI key zone is released. The new generation failed to recover due to a promotion guarantee, triggering a full GC.
CMS can run in incremental mode (I-CMS), using parameter-xx:+cmsincrementalmode. In incremental mode, the CMS collector does not monopolize the entire cycle during the concurrency phase, but periodically pauses and wakes up the application thread. The collector divides the concurrency phase into fragments that are scheduled to run between secondary (minor) recoveries. This is useful for applications that require low latency and run on a small number of CPU servers.
The following is a log of the incremental mode CMS.
2803.125: [GC 2803.125: [parnew:408832k->0k (409216K), 0.5371950 secs] 611130k->206985k (1048192K) icms_dc=4, 0.5373720 secs]
2824.209: [GC 2824.209: [parnew:408832k->0k (409216K), 0.6755540 secs] 615806k->211897k (1048192K) icms_dc=4, 0.6757740 secs]
The Cenozoic takes 537 milliseconds and 675 milliseconds. The ICMS is represented by ICMS_DC during a short run between 2 collections, and ICMS_DC represents the duty duty of the run. The duty-free ratio here is 4%.
With simple calculations, the ICMS incremental phase takes 4/100 * (2824.209–2803.125–0.537) = 821 milliseconds, or 4% of 2 GC intervals.
In JDK 1.5, the CMS adds a concurrent can abort pre-cleanup (concurrent abortable preclean) phase. You can abort the pre-cleanup phase, running between parallel pre-cleanup and re-tagging until you get the desired Eden space occupancy. This phase is added to avoid a garbage removal immediately following the re-tagging phase. To differentiate between garbage removal and re-tagging as much as possible, we try to schedule a re-tagging phase between two garbage removals.
There is a second reason what we do this. Immediately following a scavenge there is likely a large number of grey objects that need rescanning. The abortable Preclean phase tries to deal with such newly grey objects thus reducing a subsequent CMS remark pause.
The re-tagging phase can be controlled through the JVM parameters cmsscheduleremarkedensizethreshold and cmsscheduleremarkedenpenetration . The default values are 2m and 50%. Cmsscheduleremarkedensizethreshold Setting the Eden area size below this value does not start the re-tagging phase because the return is expected to be negligible cmsscheduleremarkedenpenetration Sets the space occupancy of the Eden area when the re-tagging phase is started. (according to the following description, Eden should refer to the entire Cenozoic)
After the pre-cleanup phase, if Eden space occupies a value larger than the Cmsscheduleremarkedensizethreshold setting, it will start to abort the pre-cleanup until the occupancy rate reaches cmsscheduleremarkedenpenetration Set the value, otherwise, we immediately schedule the re-tagging phase. The opposite of what I said above, I don't know if I translated it correctly.
7688.150: [Cms-concurrent-preclean-start]
7688.186: [cms-concurrent-preclean:0.034/0.035 secs]
7688.186: [Cms-concurrent-abortable-preclean-start]
7688.465: [GC 7688.465: [parnew:1040940k->1464k (1044544K), 0.0165840 secs] 1343593k->304365k (2093120K), 0.0167509 secs]
7690.093: [cms-concurrent-abortable-preclean:1.012/1.907 secs]
7690.095: [Gc[yg occupancy:522484 K (1044544 k)]7690.095: [Rescan (parallel), 0.3665541 secs]7690.462: [Weak refs proces Sing, 0.0003850 secs] [1 cms-remark:302901k (1048576K)] 825385K (2093120K), 0.3670690 secs]
In the above log, after pre-cleaning, startup can abort pre-cleanup, and then a young generation of garbage collection, the younger generation occupies from 1040940K to 1464K. Re-tagging occurs when the young generation occupancy rate reaches 522484K that is 50% of the heap
Note In JDK1.5, the garbage collection log output of the young generation is in the later re-tagging phase
Understanding the CMS garbage collection log