Cms gc practice summary

Source: Internet
Author: User

First of all, I would like to thank Alibaba Cloud for helping me understand the adjustment of this GC algorithm, rather than staying at the stage of understanding. After reading Sun's document and discussing it with Apsara, let's make a small summary. If there are any mistakes, please correct me.
CMS, full name concurrent low pause collector, is a new GC algorithm introduced in later versions of jdk1.4, which has been further improved in jdk5 and jdk6, it is mainly suitable for scenarios where the need for response time importance is greater than the requirements for throughput. It can withstand the need for garbage collection threads and application threads to share processor resources, in addition, there are many long-lived objects in the application. CMS is used for the collection of tenured generation, that is, the collection of the old generation. The goal is to minimize the pause time of the application and reduce the probability of full GC, use the garbage collection thread concurrent with the application thread to mark the old generation. In our applications, because of the existence of cache and high response time requirements, we hope that
We hope to try CMS to replace the parallel collector used by the default server-type JVM, so as to get a Shorter pause time for garbage collection and improve the responsiveness of the program.
CMS uses two short pauses instead of the long pause of the serial mark sorting algorithm. The collection cycle is as follows:
Initial mark (CMS-Initial-mark)-> concurrent mark (CMS-Concurrent-mark)-> re-mark (CMS-remark)-> concurrent clear (CMS-Concurrent-sweep) -> concurrent resetting status waiting for triggering of the next CMS (CMS-Concurrent-reset).
Among them, step 1 and Step 3 need to suspend all application threads. For the first time, the object marked as alive starting from the root object is paused. This stage is called the initial mark. For the second pause, all application threads are paused after the concurrent mark, remark the objects missed in the concurrent mark stage (because the object state is updated after the concurrent mark stage ends ). The first pause is short, and the second pause is usually long, and remark can be concurrently marked.

The so-called concurrency in the concurrent tag, concurrent cleanup, and concurrent resetting stages refersOne or more garbage collection threads and application threads run concurrentlyThe garbage collection thread does not pause application execution. If you have more than one processor, the concurrent collection thread runs on a different processor than the application thread. Obviously, this overhead will reduce the application throughput. Remark stageParallelAfter all applications are suspended, a certain number of garbage collection processes are started for parallel marking. At this time, the application thread is paused.

The recovery of young generation in CMS still uses the parallel replication collector, which is consistent with the paralle GC algorithm.

The following is a summary of the parameter introduction and problems encountered,

1. enable CMS:-XX: + useconcmarksweepgc. Ke, I made a low-level error and wrote the "+" as "-".

 

2. The number of recycle threads started by CMS by default is (parallelgcthreads + 3)/4 ).-XX: parallelcmsthreads= 20. parallelgcthreads is the number of parallel collection threads of the young generation.


3. CMS does not organize heap fragments. To prevent heap fragments from causing full GC, you can enable the CMS stage to merge fragments:-XX: + usecmscompactatfullcollectionTo some extent, enabling this option will affect the performance. The blog of Apsara Stack said that you may be able to adjust the performance by configuring the appropriate cmsfullgcsbeforecompaction.

4. To reduce the second pause time, enable parallel remark:-XX: + cmsparallelremarkenabled. If the remark is still too long, you can enable-XX: + cmsscavengebeforeremarkTo reduce the pause time of the remark, but the minor GC will start again after the remark operation.

5. To avoid full GC caused by the full perm zone, we recommend that you enable the CMS recovery perm zone option:

+ Cmspermgensweepingenabled-XX: + cmsclassunloadingenabled

6. By default, CMS is collected when tenured generation is full of 68%. If your old generation is not growing so fast and you want to reduce the number of CMS times, you can increase the value as appropriate:
-XX: cmsinitiatingoccupancyfraction = 80

The CMS recycle is started only when it is changed to 80%.

7. What is the default number of parallel collection threads of the young generation (CPU <= 8 )? CPU: 3 + (CPU * 5)/8). If you want to reduce the number of threads, you can-XX: parallelgcthreads =N to adjust.

8. Enter the focus. After some parameters are set, for example:

 

Java code
  1. -Server-xms1536m-xmx1536m-XX: newsize = 256 m-XX: maxnewsize = 256 m-XX: permsize = 64 m
  2. -XX: maxpermsize = 64 m-XX:-useconcmarksweepgc-XX: + usecmscompactatfullcollection
  3. -XX: cmsinitiatingoccupancyfraction = 80-XX: + cmsparallelremarkenabled
  4. -XX: softreflrupolicymspermb = 0

 

You need to measure the system performance of these parameters in the production environment or stress testing environment. In this case, you need to enable the GC log to view the specific information. Therefore, add the following parameters:

-Verbose: GC-XX: + printgctimestamps-XX: + printgcdetails-xloggc:/home/test/logs/GC. Log

The CMS log output is similar to the following:

 

Java code
  1. 4391.322: [GC [1 CMS-Initial-MARK: 655374 K (1310720 K)] 662197 K (1546688 K), 0.0303050 secs] [times: User = 0.02 sys = 0.02, real = 0.03 secs]
  2. 4391.352: [CMS-Concurrent-mark-start]
  3. 4391.779: [CMS-Concurrent-MARK: 0.427/0.427 secs] [times: User = 1.24 sys = 0.31, real = 0.42 secs]
  4. 4391.779: [CMS-Concurrent-preclean-start]
  5. 4391.821: [CMS-Concurrent-preclean: 0.040/0.042 secs] [times: User = 0.13 sys = 0.03, real = 0.05 secs]
  6. 4391.821: [CMS-Concurrent-abortable-preclean-start]
  7. 4392.511: [CMS-Concurrent-abortable-preclean: 0.349/0.690 secs] [times: User = 2.02 sys = 0.51, real = 0.69 secs]
  8. 4392.516: [GC [YG occupancy: 111001 K (235968 K)] 4392.516: [rescan (parallel), 0.0309960 secs] 4392.547: [Weak refs processing, 0.0417710 secs] [1 CMS-remark: 655734 K (1310720 K)] 766736 K (1546688 K), 0.0932010 secs] [times: User = 0.17 sys = 0.00, real = 0.09 secs]
  9. 4392.609: [CMS-Concurrent-sweep-start]
  10. 4394.310: [CMS-Concurrent-sweep: 1.595/1.701 secs] [times: User = 4.78 sys = 1.05, real = 1.70 secs]
  11. 4394.310: [CMS-Concurrent-reset-start]
  12. 4394.364: [CMS-Concurrent-Reset: 0.054/0.054 secs] [times: User = 0.14 sys = 0.06, real = 0.06 secs]

 


We can see that the CMS-Initial-mark stage has been suspended for 0.0303050 seconds, while the CMS-remark stage has been suspended for 0.0932010 seconds. Therefore, the total time of the two pauses is 0.123506 seconds, that is, about 123 milliseconds. The sum of the two short pauses is less than 200.

But you may encounterTwo fail causes full GC: Prommotion failed and concurrent mode failed.

The log output of prommotion failed is probably like this:

 

Java code
  1. [Parnew (promotion failed): 320138 K-> 320138 K (353920 K), 0.2365970 secs] 42576.951: [CMS: 1139969 K-> 1120688 K (
  2. 166784 K), 9.2214860 secs] 1458785 K-> 1120688 K (2520704 K), 9.4584090 secs]


This problem occurs because there is not enough rescue space to transfer objects to the old generation. The old generation does not have enough space to accommodate these objects, resulting in a full GC. There are two totally different ways to solve this problem:Increase the rescue space, increase the old generation, or remove the rescue space.. To increase the rescue space, adjust the-XX: discounted vorratio parameter. this parameter is the ratio of the Eden zone to the region vor zone. The default value is 32, that is, the Eden zone is 32 times the size of the region vor zone, note that there are two regions in vivo Vo, so surivivor accounts for 1/34 Of the total young genertation. This parameter will increase the region vor, so that the object can be longer in the survitor area as much as possible, reducing the number of objects that enter the older generation. Remove rescue empty
The idea is to let most of the data that cannot be recycled immediately into the old generation as soon as possible, speed up the recovery frequency of the old generation, and reduce the possibility of the increase in the old generation. This is done by converting-XX: survivorratio is set to a relatively large value (such as 65536. In our application, we set young generation to 256 m. This value is relatively large, and the rescue space is set to the default size (1/34). From the pressure test, there is no prommotion failed phenomenon. The young generation is relatively large. From the GC log, the minor GC time is also within 5-20 milliseconds, which is acceptable. Therefore, it is not adjusted for the time being.

Concurrent mode failed is generated because CMS recycles the old generation too slowly, so that the old generation is full before CMS completes, resulting in full GC. To avoid this phenomenon, it is reduced.-XX: cmsinitiatingoccupancyfractionThe parameter value allows CMS to be triggered more frequently earlier, reducing the possibility that the old generation will be full. Our application has a low load ratio for the time being, and the growth of the old generation in the production environment was very slow. Therefore, we set this parameter to 80 for the time being. In the stress testing environment, the performance of this parameter is acceptable, and no concurrent mode failed has been found.

References:
Optimization of jdk5.0 garbage collection-Don't pause
Record a Java GC Adjustment Experience 1, 2 by arbow
Java SE 6 hotspot [Tm] Virtual Machine garbage collection Tuning
Tuning garbage collection with the 5.0 javatm Virtual Machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.