This article is the third of the GC expert series. In the first understanding of Java garbage Collection, we learned several different GC algorithm processing processes, the way GC works, the difference between the new generation and the old age. So, you should have known about the 5 GC types in JDK 7, and the performance impact of each GC.
The second monitoring of Java garbage collection describes how the JVM runs the GC in real-world scenarios, how to monitor GC data, and what tools are available to facilitate GC monitoring.
In this article, I'll cover some of the best options for GC tuning based on real-world cases. When I write an article, I assume that you have understood the first two articles. To get a deeper understanding of this section, you'd better start by looking at the first two articles-if you haven't already.
is GC tuning necessary?
More precisely, does Java-based services necessarily require GC tuning ? It should be said that GC tuning is not something that all Java services have to do. Of course this is based on the following options or facts that you have used:
- Memory size specified by-XMS and-XMX options
- Using the-server option
- The system does not produce too many timeout logs
That is, if you do not set the memory size and your system generates too many timeout logs, congratulations you need to perform GC tuning for your system.
However, keep in mind that GC tuning is a last resort option .
Consider the underlying causes of GC tuning. The garbage collector cleans up objects created in Java. The amount of object data that the GC needs to clean up and the number of times the GC executes depends on how much the app creates. Therefore, in order to control the execution of the GC, you first need to reduce the creation of objects .
As the saying goes, "Humpty". So we need to start small, or they will continue to grow until it is difficult to manage.
- You should use more StringBuilder and StringBuffer objects instead of String.
- Reduce unnecessary log output.
Even so, there are some scenarios in which we can do nothing. We know that parsing XML and JSON takes up a lot of memory space. Even if we use String as little as possible to optimize the log output as well as possible, there will still be a lot of memory overhead and even 10~100MB in parsing XML and JSON, but it is difficult to eliminate the use of XML and JSON. But keep in mind that XML and JSON bring a lot of memory overhead.
If your app's memory footprint continues to increase, you'll start to tune it for GC tuning. I've divided the GC tuning goals into the following two categories:
- Reduce the number of objects moving to the old age
- Shorten the execution time of full GC
Reduce the number of objects moving to the old age
In the Oracle JVM, in addition to the G1 GC introduced in JDK 7 and the highest version, the GC is based on generational recycling. That is, the object is created in the Eden area and then moved back and forth in the survivor. Then, if the object is still alive, it will be moved to the old age. Some objects, because they occupy too much space, are moved directly to the old age after they are created in the Eden area. The older GC will take longer than the new generation, so reducing the number of objects moving to the old age can reduce the frequency of full GC. Reducing the transfer of objects to the old age may be misunderstood as keeping objects in the new generation, however it is not possible, instead you can adjust the size of the new generation of space .
Shorten full GC time-consuming
A single execution of the full GC is significantly more time-consuming than minor GC. If performing a full GC takes too long (for example, more than 1 seconds), a timeout may occur in the connection to the external service.
- If you attempt to reduce the full GC execution time by narrowing the old-age space, you may face outofmemoryerror or bring more frequent full GC.
- If you reduce the number of full GC executions by increasing older generation space, a single full GC time-out will increase.
Therefore, you need to set the appropriate size for the old age space .
Options that affect GC performance
In understanding the end of Java garbage collection, I said not to have the idea that someone else has achieved significant performance gains through a GC option, why I don't use this option directly. Because the number of objects owned by different services and the life cycle of the objects are different .
A simple scenario where five conditions are required to perform a task: A, B, C, D, and E, the other task requires only two conditions A and B, which task is faster? Usually only conditions A and B are required for tasks that are faster.
The same is true of the Java GC option settings. Setting many options does not necessarily increase the speed of GC execution, but may be more time-consuming. The basic rule of GC tuning is to set different options for two or more servers, compare performance, and then add the option that proves to improve performance to the application server. Please keep this in mind.
The following table lists the memory-related GC options that affect performance:
Table 1:GC Tuning the options you need to focus on
category |
Options |
Description |
Heap Space |
-xms |
Initial heap space size when starting the JVM |
|
-xmx |
Heap Space Maximum Value |
Cenozoic Space |
-xx:newratio |
The ratio of the new generation to the old age |
|
-xx:newsize |
Cenozoic size |
|
-xx:survivorratio |
The ratio of Eden to Survivor District |
The options I often use are:-xms,-xmx and-xx:newratio, where-xms and-xmx are required. And how to set-xx:newratio has a significant impact on performance.
One might ask how to set the size of a permanent generation (Perm) , which can be set using-xx:permsize and-xx:maxpermsize, but remember that only outofmemoryerror that is caused by insufficient Perm space is required.
Another option that affects GC performance is the GC type, and the following table lists the relevant settings options available in JDK 6.0:
Table 2:GC Type options
category |
Options |
Description |
Serial GC |
-xx:+useserialgc |
|
Parallel GC |
-xx:+useparallelgc-xx:parallelgcthreads=<value> |
|
Parallel Compacting GC |
-xx:+useparalleloldgc |
|
CMS GC |
-xx:+useconcmarksweepgc -xx:useparnewgc -xx:+cmsparallelremarkenabled -xx:cmsinitiatingoccupancyfraction=<value> -xx:+usecmsinitiatingoccupancyonly |
|
G1 |
-xx:+unlockexperimentalvmoptions-xx:+useg1gc |
When you use G1 in JDK6, both options must be set |
In addition to G1, other GC types are set by the first line option for each row and column. The serial GC is usually the least used, and it is optimized and designed for the client application.
There are many other options that affect GC performance, but they are not as obvious as the performance implications above. Setting more options does not necessarily optimize the GC's execution time.
GC Tuning Process
The GC tuning process is similar to the general performance improvement process, which describes my process in GC tuning.
1. Monitor GC Status
First, you need to monitor GC state information to clarify the impact on the system during GC operations. Specific ways to review the previous article: Java garbage Collection monitoring.
2. Analyze the monitoring data and decide whether GC tuning is required
Then through the GC operation status, the monitoring results are analyzed, and determine whether it is necessary to perform GC tuning. If the analysis results show that the GC takes less than 0.1-0.3 seconds, it generally does not take extra time to do GC tuning. However, if the GC takes up to 1-3 seconds or more than 10 seconds, the system needs to be tuned for GC immediately.
However, if your application allocates 10GB of memory and cannot reduce memory capacity, there is no way to perform GC tuning. In this case, you first have to think about why you need to allocate so much memory. If you assign only 1GB or 2GB of memory to your application, you will need to analyze the cause of the memory overflow and fix it by heap dump when outofmemeoryerror occurs.
Note: Heap dump is the output of memory in a certain format to a file, which can be used to examine objects and data in Java memory. The heap dump file can be created using the jmap command built into the JDK. During the creation of the file, the Java process is interrupted, so do not do so on the normal runtime system.
3. Set GC type and memory size
If you decide to do GC tuning, you need to consider how to choose the GC type and how to set the memory size. If you have more than one server, this step is important by setting different GC options for each server and comparing the different performance.
4. Analyzing GC Tuning Results
After you set the GC options, collect at least 24 hours of GC performance data, and then you can start analyzing the data. If you're lucky enough, the analysis will just find the most appropriate GC option. Otherwise, you need to analyze the GC logs and analyze the allocation of memory. The optimal options for the system are then found by adjusting the GC type and memory size differently.
5. If the results are acceptable, apply tuning options to all services and stop tuning
If the GC results are satisfactory, you can apply the appropriate options to all servers and stop GC tuning.
The detailed procedures in each step are described in detail in the following sections.
Monitor GC status and analyze GC results
The best way to monitor a Web app (Was:web application Server) GC run state is to use the jstat command. The monitoring section of Java garbage collection has described how to use the Jstat command, so here's how to verify the result data directly.
The following example lists the data when the JVM is not tuned for GC tuning:
$ jstat-gcutil 21719 1ss0 S1 E O P ygc ygct FGC fgct GCT48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.67348.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673
Look at the table YGC and YGCT,YGCT divided by YGC to figure out the average single YGC time is 0.05 seconds. This means that the average time to perform a garbage collection in the Cenozoic is 50 milliseconds. With this result, we do not need to focus on the new generation of garbage collection.
Then look at the FGCT and FGC,FGCT divided by FGC to figure out the average single FGC time is 19.68 seconds. That is, an average of 19.68 seconds to execute the full GC. The results above (a total of 3 full GC) may be 19.68 seconds per full GC, or two times in 1 seconds, while the other consumes 58 seconds at a time. However, in either case, GC tuning is urgently needed.
It is also possible to verify the results by Jstat , but the best way to parse the GC is to start the JVM with the-VERBOSEGC option. In the previous article I explained in detail how the log was generated and how it was analyzed. In terms of analyzing -verbosegc logs, hpjmeter is my favorite tool because it's easy to use. Using Hpjmeter, you can easily get the overhead of GC execution time and the frequency of GC occurrences.
GC tuning is not necessary if the GC execution time satisfies the following criteria.
- Minor GC performs quickly (within 50 milliseconds)
- Minor GC does not perform frequently (interval 10 seconds or so)
- Full GC executes quickly (less than 1 seconds)
- Full GC does not perform frequently (interval 10 minutes or so)
The values in parentheses are not absolute, depending on the service state of the application. Some services may require full GC processing to take less than 0.9 seconds, while others may be looser. Therefore, the GC results are verified and the GC tuning is determined based on the specific service needs.
When verifying GC status, don't just care about the time-consuming minor GC and full GC, but also the number of GC executions . If the Cenozoic is too small, the Minor GC will be executed frequently (even once every 1 seconds interval). In addition, the new generation is too small to transfer to the old age of the object increased, will also cause the frequent implementation of full GC. So use '-gccapacity ' with the Jstat command to check the memory space usage.
Set GC type and memory size settings GC type
Oracle JVM provides 5 GC types, and if it is lower than JDK 7, you can use Parallel GC, Parallel compacting GC, CMS GC. Of course, there is no uniform criterion or standard in which to choose.
So how do you choose the right GC type ? The recommended scenario is to apply these three GC types to the application for comparison. However, it is clear that the CMS GC is certainly faster than parallel GCs, which is why it is good to use only the CMS GC. However, when the CMS GC has problems, it is usually faster to use the CMS GC in full GC, and if the CMS GC's concurrency mode fails, there will be a slower case than parallel GCs.
concurrency mode failed
Let's take a closer look at the scenario in which the concurrency pattern fails.
The biggest difference between Parallel GC and CMS GC is the compression task. The compression task uses compressed memory to remove the fragmented space in memory to clean up the gaps in the memory space allocated for use by two blocks.
In the parallel GC, memory compression is performed as long as the full GC is executed, and therefore takes longer. However, after the full GC, because of the compression of the original, you can allocate continuous space, so the memory allocation speed is faster.
In contrast, the CMS GC is not accompanied by memory compression during execution, so the GC is faster. However, memory compression is not done, and memory freed during GC cleanup becomes free space. Because space is not contiguous, it can result in insufficient space when creating large objects. For example, if the old age is still 300M idle, it cannot allocate enough contiguous space for the 10MB object. In this case, a warning of a concurrency pattern failure occurs and a memory compression is triggered. If you use a CMS GC, it may be more time consuming than parallel GCS during memory compression, and may cause additional problems. A more detailed introduction to "Concurrency mode failure" can be seen in Oracle Engineer's article: Understanding CMS GC logs.
The conclusion is to find the appropriate GC type for your system.
Each system has one of the most appropriate GC types, so you need to find this GC type. If you have 6 servers, it is recommended that you set the same options for each of the two groups and analyze and compare the results with the-VERBOSEGC option.
Adjust Memory size
The relationship between the size of the memory and the number of GC executions and the time spent per GC is listed below:
- Large Memory
- Will reduce the number of GC executions
- The corresponding increase in GC execution time
- Small memory
- will be known for a single GC time-consuming
- The corresponding increase in the number of GC executions
Of course, there is no single correct answer about using large or small memory. It is also possible to use 10GB of memory if the server resources are sufficient and the full GC execution time can be controlled within 1 seconds. But most of the time if setting memory for 10GB,GC does not work well, executing a full GC may take up to 10-30 seconds (depending on the size of the object).
In this case, how to set the memory size correctly . Normally, I would recommend a 500MB size. This is not to say you have to set your own was (WEB application Server) memory option to-xms500 and-xmx500m. Check the memory size changes after the full GC based on the scenario that is currently not tuned. If there is still 300MB space left after the full GC, it is best to set the memory to 1GB (300MB (default) + 500MB (the old age minimum) + 200MB (free space)). This means that you should set at least 500MB space in the old age. If you have 3 servers, you can set 1GB, 1.5GB, and 2GB, respectively, and check the results of each machine's execution.
Theoretically, depending on the size of the memory a single execution GC speed should be 1GB > 1.5GB > 2GB, so 1GB of memory will be among the three GC fastest. However, the 1GB memory full GC is not guaranteed to take 1 seconds, and the 2GB memory full GC takes 2 seconds. Actual time-consuming is also related to machine performance and object size. So the best way to measure is to set each possibility and analyze their monitoring results.
When you set the memory size, you also need to set another option: Newratio. Newratio is the inverse of the ratio between the new generation and the old age (the ratio between the old and the Cenozoic). If xx:newratio=1, that is, the new generation: the ratio of the old age is 1:1. For 1GB memory, is the new generation and the old age each 500MB. If the value of Newratio is 2, it is Cenozoic: The value of the old age is 1:2. Therefore, the larger the ratio set, the older the greater the space, the corresponding new generation space will be smaller.
Setting up Newratio is not an important thing, but it can have a serious impact on the overall GC performance. If the new generation is too small, objects will move to the old age, causing frequent full GC, resulting in more time-consuming.
You may simply think that setting up the newratio=1 will bring the best results, but not so. Setting the Newratio to 2 or 3 makes it easier to bring good GC performance. Of course, I have actually encountered some of these examples.
What is the quickest way to complete GC tuning? The fastest way to get GC tuning results is by comparing the results of performance tests. It is best to observe 1-2 days of data by setting different options for each server and observing the GC status. If you are doing GC tuning through performance testing, prepare the same load and business operations for each server. The allocation of the request scale should also be consistent with the business conditions. Even a professional performance tester, however, is not easy to prepare for accurate load data, which usually takes a lot of effort to prepare. So a more streamlined GC tuning approach is to prepare GC options for business applications, and then wait for GC results to be analyzed, although longer wait times may be required.
Analyze GC Tuning Results
After you apply the GC option and set-VERBOSEGC, you can check that the log is output as expected through the tail command. If the options are not set precisely or are not expected to be output, the time you spend will be wasted. If the log output matches your expectations, you can check and analyze the results after waiting for 1-2 days to run. The simplest way is to copy the log files to the local PC and use hpjmeter for analysis.
The following data is the main focus of the analysis process, and the following list is listed in the priority level that I define for myself. The most important data that determines the GC option is the full GC execution time.
- Full GC (average) time-consuming
- Minor GC (average) time-consuming
- Full GC execution interval
- MINORGC execution Interval
- Full GC overall time-consuming
- Minor GC Overall time-consuming
- GC Overall time-consuming
- Full GC execution times
- Minor GC Execution times
If you're lucky enough to find the right GC option, you're usually not so lucky. Be careful when performing GC tuning, because if you try to complete GC tuning at once, you may get outofmemoryerror.
Tuning case
The above discussion of GC tuning is just an armchair, and now we're going to look at some specific GC tuning cases.
Case 1
This example is a GC optimization for service S. For this new on-line service S, it's a bit too time-consuming to execute full GC.
First look at the results of Jstat-gcutil:
S0 S1 E O P ygc ygct FGC fgct GCT12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993
In the beginning of tuning without too much concern about the setting of the persistent generation space, relatively ygc value is more worthy of attention.
From the above results we can calculate the average time spent executing minor GC and full GC, as in the following table:
Table 3: Service s average time-consuming to perform minor GC and full GC
GC Type |
number of GC executions |
GC Execution Time |
average Time-consuming |
Minor GC |
54 |
2.047 |
Panax Notoginseng ms |
Full GC |
5 |
6.946 |
1389 MS |
For minor GC, the PNS is not bad, and the average time spent 1.389 s for the full GC can cause frequent timeouts when executing full GC, such as when the DB timeout is set to 1 s, a timeout occurs. So the system in this case needs to be tuned for GC.
Check the current memory settings first before starting GC tuning. You can use the Jstat-gccapacity option to view memory usage. Here is the result of the service s check:
NGCMN ngcmx NGC s0c s1c EC ogcmn ogcmx OGC OC pgcmn pgcmx PGC PC ygc FGC212992.0 212992.0 212992.0 21248.0 21248.0 170496. 0 1884160.0 1884160.0 1884160.0 1884160.0 262144.0 262144.0 262144.0 262144.0 54 5
The key data are as follows:
- Cenozoic use: 212, 992 KB (approx. 208 MB)
- Old age Use: 1,884,160 KB (approx. 1.8 GB)
So the memory allocation outside of the persistence generation is 2 GB, and the new generation: The old age is 1:9 (i.e. newratio=9). In order to see more detailed information, the three different implementations of the system are set to-VERBOSEGC and the Newratio option is set separately, except that other options are not added.
- Newratio = 2
- Newratio = 3
- Newratio = 4
Fortunately, after a day of checking the GC, the full GC has not yet occurred after setting Newratio.
What happened? Because most objects are destroyed shortly after they are created, objects in the Cenozoic are destroyed before they are moved to the old age.
In this case, there is no need to set other options, just choose the best newratio. How do I choose the best Newratio? The average time-consuming minor GC can only be analyzed individually when setting different newratio values.
The average time spent on the above three newratio settings for the corresponding minor GC is as follows:
- Newratio=2:45ms
- Newratio=3:34ms
- Newratio=4:30ms
Because Newratio=4 minor GC has the smallest time-consuming, it is the best setting we choose, even though the new generation of space is relatively small at this time. After you apply this option, the service no longer has full GC occurrences.
The following is the result of the system Reset option one day through Jstat-gcutil:
S0 S1 E O P ygc ygct FGC fgct GCT8.61 0.00 30.67 24.62 22.38 2424 30.219 0 0.000 30.219
You might think that because the system receives too few requests so that the GC occurs less frequently, the full GC does not occur in the case where the minor GC performs 2,424 times.
Case 2
Examples of service A are described below. We found in the company's application performance management platform (Apm:application performance manager) that the JVM of service a periodically experienced a prolonged pause (more than 8 seconds of unresponsive). So we decided to make GC tuning for it. After troubleshooting we found that this system is too time-consuming to perform full GC and needs to be optimized.
Before proceeding to optimization, we added the-VERBOSEGC option to the system, with output such as:
Figure 1:GC Pre-tuning GC time-consuming
is a time-consuming graph of the system GC that is provided after hpjmeter automated analysis results as the JVM runs. The x-axis is the run-time axis of the JVM from the start, and the Y- axis is the response time of each GC. The green one is the time consuming of the CMS garbage collection used by full GC, and the Blue is the time-consuming parallel scavenge garbage collection used by the minor GC.
I said earlier that the CMS GC is the fastest, but you can see that the scene is time-consuming to reach 15 seconds. what causes the consequences? recall what I said earlier: CMS slows down when memory is compressed. In addition, service a sets the options for-xms1g and-XMX4G, and the operating system allocates 4 GB of memory for it.
Then I changed the GC type from GMS to parallel GC, and set the memory size to 2G and the Newratio to 3. After a period of time through the jstat-gcutil to see the results are as follows:
S0 S1 E O P ygc ygct FGC fgct GCT0.00 30.48 3.31 26.54 37.01 226 11.131 4 11.758 22.890
The full GC has been boosted to an average of 3 seconds at a time compared to 15 seconds of 4GB memory. But 3 seconds is still unsatisfactory, so I designed the following six options:
- -xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=2
- -xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=3
- -xx:+useparallelgc-xms1g-xmx1g-xx:newratio=3
- -xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=2
- -xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=3
- -xx:+useparalleloldgc-xms1g-xmx1g-xx:newratio=3
Which one will be faster? The results show that the smaller the memory, the faster the speed. is the GC duration distribution graph for the sixth set of options, which represents the optimal GC performance improvement. The slowest is seen in the figure for 1.7 seconds, and the average is reduced to less than 1 seconds.
Figure 2: GC time-consuming after using the sixth set of options
So I adjusted the GC option for service A to the set in the sixth set, but every night there was a succession of outofmemoryerror. The difficulty is no longer elaborate, in short, the bulk of the data Processing task caused the JVM memory leaks. At this point, all the questions are clear.
It is dangerous to apply GC tuning results to all servers if only a short observation of the GC log is made. It is important to remember that if GC tuning is performed smoothly and without faults there is only one way: analyze every service operation of the system as if it were a GC log.
The specific process of GC tuning is demonstrated by the two GC tuning cases above. As I said, the GC options in the case can be applied without tweaking to services that have the same CPU, operating system and JDK versions, and perform the same functions. However, do not apply these options to your system because they may not be applicable.
Summarize
I perform GC tuning generally based on experience without having to perform a detailed analysis of memory after a heap dump, although accurate memory state may lead to better GC tuning results. In general scenarios, it may be better to analyze memory objects if the memory load is low, but it is recommended to do GC tuning based on experience if the service load is high and memory space is used more.
I have done a performance test on G1 GC on some services, but it has not been fully used. It turns out that the G1 GC performs faster than any other GC, but the JDK needs to be upgraded to JDK 7 to get the G1 performance boost, and the stability of G1 is not yet fully guaranteed, and no one knows if it will cause a serious bug. So it is still time to use G1 on a large scale.
When JDK 7 is stabilized (not that it is currently unstable) and was optimized for JDK 7, G1 may run stably on the server, and then it may no longer be necessary to perform GC tuning.
More GC tuning details can be found on the Slideshare for related materials. What I recommend most is that the Twitter engineer, Attila Szegedi, writes about the JVM tuning that I learned on Twitter and has time to learn.
Sangmin Lee, senior engineer of Performance Lab, NHN Company
Java GC Expert Series 3:GC tuning practices