This article introduces JavaGC optimization practices. For more information about coders, see. This is the third article in the "GC Expert Series. In the first article, we learned the processing process of several different GC Algorithms in Java garbage collection, the working method of GC, and the difference between the new generation and the old generation. Therefore, you should have understood the five GC types in JDK 7 and their impact on performance.
In the second article, the monitoring of Java garbage collection introduces how the JVM runs GC, how to monitor GC data, and what tools are available for GC monitoring.
In this article, I will introduce some optimal GC tuning options based on actual cases. When writing this article, I assume that you have understood the content of the first two articles. To gain a deeper understanding of this part of content, you 'd better first look at the content of the first two articles-if you have not yet understood it.
Is GC optimization required?
To be more precise,Does Java-based service require GC optimization?? It should be said that GC optimization is not a must for all Java services. Of course, this is based on the options or facts you have used:
- The memory size is specified through the-Xms and-Xmx options.
- -Server option used
- The system does not generate too many timeout logs
That is to say, if you have not set the memory size and your system generates too many timeout logs, congratulations, you need to perform GC optimization for your system.
However, remember:GC optimization is a last resort.
Think about the underlying reasons for GC optimization. The garbage collector cleans up the objects created in Java. The object data to be cleared by GC and the number of GC executions depend on the number of objects created by the application. Therefore, to control GC execution, you must firstReduce object creation.
As the saying goes, "it's hard to accumulate ". So we need to start from scratch, otherwise they will continue to grow until it is difficult to manage.
- Use StringBuilder and StringBuffer to replace String.
- Reduce unnecessary log output.
Even so, we can't do anything in some scenarios. We know that parsing XML and JSON will occupy a lot of memory space. Even if we use as few strings as possible and optimize log output as much as possible, there will still be a lot of memory overhead when parsing XML and JSON, and even 10 ~ As large as MB, it is difficult for us to prevent the use of XML and JSON. But remember: XML and JSON bring a lot of memory overhead.
If the memory usage of an application continues to increase, you need to start GC optimization. The goal of GC optimization is divided into the following two types:
- Reduce the number of objects moved to the old age
- Shorten Full GC execution time
Reduce the number of objects moved to the old age
In Oracle JVM, except for the G1 GC introduced in JDK 7 and the maximum version, other GC types are based on generational collection. That is, the object will be created in the Eden area, and will be moved back and forth in the same VOR. If the object remains alive, it will be moved to the old age. Some objects are moved to the old age directly after being created in the Eden zone because they occupy too much space. GC in the old age takes longer than that in the new generation. Therefore, reducing the number of objects moved to the old age can reduce the frequency of full GC. Moving objects to the old age may be misunderstood as retaining objects in the new generation. However, this is impossible. On the contrary, you canAdjust the size of new generation space.
Reduce Full GC time consumption
Compared with Minor GC, a single execution of Full GC significantly increases the time consumption. If Full GC execution takes too long (for example, more than 1 second), timeout may occur during external service connections.
- If you try to reduce the Full GC execution time by narrowing down the space of the old generation, you may face OutOfMemoryError or bring more frequent Full GC.
- If the number of Full GC executions is reduced by increasing the space of the old generation, the time consumed by a single Full GC operation will increase.
ThereforeSet an appropriate size for the old age Space.
Options that affect GC performance
At the end of understanding Java garbage collection, I said that there should be no such idea: others have achieved significant performance improvement through a certain GC option. why am I not using this option directly. BecauseThe number of objects owned by different services and the lifecycle of objects are different..
In A simple scenario, if A task requires five conditions: A, B, C, D, and E, the other task only requires two conditions A and B. which task will be faster? Generally, tasks with only condition A and condition B are faster.
The same applies to Java GC options. Setting many options may not increase the GC execution speed, but may consume more time. The basic rule for GC optimization is to set different options for two or more servers, compare the performance, and then add the options that prove to be able to improve the performance to the application server. Remember this.
The following table lists GC options related to memory and affect performance:
Table 1: Options for GC optimization
Category |
Option |
Description |
Heap space |
-Xms |
Size of the initial heap space when JVM is started |
|
-Xmx |
Maximum heap space |
New generation space |
-XX: NewRatio |
Ratio of new generation to old generation |
|
-XX: NewSize |
New generation size |
|
-XX: invalid vorratio |
Ratio of Eden Zone to region vor zone |
I often use the following options:-Xms,-Xmx and-XX: NewRatio, where-Xms and-Xmx are required. How to set-XX: NewRatio has a significant impact on performance.
Someone may askHow to set the Perm sizeYou can use-XX: PermSize and-XX: MaxPermSize to set them, but remember that you only need to set them when an OutOfMemoryError occurs due to insufficient Perm space.
Another option that will affect GC performance is the GC type. The following table lists the related settings that can be used in JDK 6.0:
Table 2: GC type options
Category |
Option |
Description |
Serial GC |
-XX: + UseSerialGC |
|
Parallel GC |
-XX: + UseParallelGC-XX: ParallelGCThreads = |
|
Parallel Compacting GC |
-XX: + UseParallelOldGC |
|
CMS GC |
-XX: + UseConcMarkSweepGC -XX: UseParNewGC -XX: + CMSParallelRemarkEnabled -XX: CMSInitiatingOccupancyFraction = -XX: + UseCMSInitiatingOccupancyOnly |
|
G1 |
-XX: + UnlockExperimentalVMOptions-XX: + UseG1GC |
When G1 is used in JDK 6, both options must be set at the same time. |
Except G1, other GC types are set by the first row of each row. Serial GC is usually the least used, which is optimized and designed for client applications.
There are many other options that affect GC performance, but they do not have obvious impact on performance. In addition, setting more options may not optimize the GC execution time.
GC optimization process
The GC optimization process is similar to the general performance improvement process. The following describes the process in the GC optimization process.
1. monitor GC status
First, you must monitor the GC status information to identify the impact on the system during the GC operation. The specific method can be reviewed in the previous article: Java garbage collection monitoring.
2. analyze monitoring data and determine whether GC optimization is required
Then, the monitoring results are analyzed based on the GC operation status, and whether GC optimization is necessary or not. If the analysis result shows that GC takes less than 0.1-0.3 seconds, GC optimization is generally not required. However, if GC takes 1-3 seconds or more than 10 seconds, you need to optimize the system immediately.
However, if your application allocates 10 GB of memory and cannot reduce the memory capacity, you cannot perform GC optimization. In this case, you should first consider why such a large memory needs to be allocated. If only 1 GB or 2 GB memory is allocated to the application, when an OutOfMemeoryError occurs, you need to use heap dump to analyze and verify the cause of memory overflow and fix it.
Note: heap dump outputs memory conditions to files in a certain format and can be used to check the objects and data in Java memory. You can use the built-inJmapCommand to create a heap dump file. During file creation, the Java process is interrupted, so do not perform this operation on the system during normal operation.
3. set GC type and memory size
If you decide to perform GC optimization, you need to consider how to select the GC type and set the memory size. If you have multiple servers, you can set different GC options for each server and compare different performance. This step is important.
4. analyze GC optimization results
After setting the GC option, you need to collect GC performance data for at least 24 hours, and then you can start to analyze the data. If you are lucky enough, you can find the most appropriate GC option through analysis. Otherwise, you need to analyze the GC log and analyze the memory allocation. Then, we can find the optimal system options by adjusting the GC type and memory size.
5. if the results are acceptable, tune the optimization options for all service applications and stop tuning
If the GC results are satisfactory, you can apply the corresponding options to all servers and stop GC optimization.
The following sections describe the detailed process in each step.
Monitor GC status and analyze GC results
The best way to monitor the GC running status of Web applications (WAS: Web Application Server) is to useJstatCommand. In the monitoring section of Java garbage collection, we have introduced how to use the jstat command, so here we will introduce how to verify the result data.
The following example lists the data when JVM is not performing GC optimization:
$ jstat -gcutil 21719 1sS0 S1 E O P YGC YGCT FGC FGCT GCT48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.67348.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673
Take a look at the YGC and YGCT in the table. divide YGCT by YGC to calculate that the average time consumed by a single YGC is 0.05 seconds. That is to say, the average execution time of garbage collection in the new generation is 50 milliseconds. With this result, we do not need to pay attention to the garbage collection of the new generation.
Then let's take a look at FGCT and FGC. FGCT divided by FGC to calculate that the average FGC time per time is 19.68 seconds. That is, it takes an average of 19.68 seconds to execute Full GC. The above result (three Full GC times in total) may be that each Full GC takes 19.68 seconds, or both of them only takes 1 second, the other time consumed 58 seconds. However, in either case, GC optimization is urgently required.
You can also useJstatBut the best way to analyze GC is to use the-verbosegc option to start JVM. In the previous article, I have described in detail how to generate logs and how to analyze them. Analyze-VerbosegcLog,HPJMeterIt is my preferred tool because it is easy to use. HPJMeter allows you to easily obtain GC execution time overhead and GC occurrence frequency.
If the GC execution time meets the following judgment conditions, GC optimization is not necessary.
- Minor GC execution is fast (within 50 milliseconds)
- Minor GC execution is not frequent (once every 10 seconds)
- Full GC execution speed (less than 1 second)
- Full GC execution is not frequent (once every 10 minutes)
The values in the brackets are not absolute and vary depending on the service status of the application. Some services may require a Full GC processing speed of up to 0.9 seconds, and some other services may be loose. Therefore, check the GC results and determine whether to perform GC optimization based on the specific service requirements.
When verifying the GC status, do not only care about the time consumed by Minor GC and Full GC.GC execution times are equally important.. If the new generation is too small, Minor GC will be executed frequently (or even once every 1 second ). In addition, the young generation is too small, leading to an increase in the number of objects transferred to the old generation, and frequent Full GC execution. Therefore, the memory-gccapacity memory is used with the jstat command to check the memory usage.
Set GC type and memory size and GC type
Oracle JVM provides five GC types. for versions earlier than JDK 7, you can use Parallel GC, Parallel Compacting GC, and cms gc. Of course, there is no uniform criterion or standard for which one to choose.
SoHow to select the appropriate GC type? The recommended solution is to apply the three GC types to the application for comparison. However, it is clear that cms gc is certainly faster than Parallel GCs, and it is better to use cms gc only. However, when cms gc has a problem, it is usually faster to use cms gc in Full GC. if the concurrent mode of cms gc fails, it will be slower than Parallel GCs.
Concurrency mode failed
Let's take a closer look at the scenarios where the concurrency mode fails.
The biggest difference between Parallel GC and cms gc is the compression task. The compression task removes the fragment space in the memory by compressing the memory usage to clear the gaps between the two allocated memory spaces.
In Parallel GC, memory compression is performed as long as Full GC is executed, which takes a longer time. However, after Full GC, continuous space can be allocated because of the original compression, so the memory allocation speed is faster.
In contrast, cms gc execution is not accompanied by memory compression, so GC speed is faster. However, no memory compression is performed, and the memory released during GC cleaning will become idle space. Because the space is not continuous, it may cause space-time insufficiency in creating large objects. For example, if there are still m idle in the old age, you cannot allocate enough continuous space for 10 MB objects. This will happenConcurrency mode failedAnd trigger memory compression. If cms gc is used, the memory compression process may be more time-consuming than Parallel GCs, and may cause other problems. For more details about "concurrent mode failure", refer to the Oracle engineer's article: Understanding the cms gc log.
The conclusion is that you need to find a suitable GC type for your system.
Each system has the most appropriate GC type, so you need to find this GC type. If you have 6 servers, we recommend that you set the same options for each of the two groups and use the-verbosegc option to analyze and compare the results.
Adjust memory size
The relationship between memory size, GC execution times, and GC time consumption is listed as follows:
- Large memory
- Will reduce the number of GC executions
- The GC execution time is increased accordingly.
- Small memory
- The time consumed by a single GC is reduced.
- The GC execution times are increased accordingly.
Of course, there is no correct answer to using large memory or small memory. If the server resources are sufficient and the Full GC execution time can be controlled within 1 second, it is also possible to use 10 GB of memory. However, if the memory is set to 10 GB in most cases, GC execution is not satisfactory, and it may consume 10 to 10 ~ 30 seconds (the specific duration varies depending on the object size ).
In this case,How to correctly set the memory size. Generally, we recommend the size of MB. This does not mean that you should set your WAS (Web Application Server) memory options to-Xms500 and-Xmx500m. Check the memory size change after Full GC based on the current unoptimized scenario. If there is still 300 MB space remaining after Full GC, it is best to set the memory to 1 GB (500 MB by default) + MB (minimum capacity in the old age) + 200 MB (idle space )). This means that you should set at least MB space in the old age. If you have three servers, you can set 1 GB, GB, and 2 GB respectively, and check the execution results of each server.
In theory, the GC speed of a single execution should be 1 GB> 1.5 GB> 2 GB according to the memory size, so the 1 GB memory will have the fastest GC speed among the three. However, it cannot be ensured that 1 GB of memory Full GC takes 1 second, and 2 GB of memory Full GC takes 2 seconds. The actual time consumption is also related to machine performance and object size. Therefore, the best measurement method is to set each possibility and analyze their monitoring results.
If the memory size is set to an hour, you need to set another option: NewRatio. NewRatio is the reciprocal ratio of the new generation to the old generation (that is, the ratio of the old generation to the new generation ). If XX: NewRatio = 1, that is, the young generation: the ratio of the old generation is. For 1 GB memory, 500 MB for the new generation and old generation respectively. If the value of NewRatio is 2, it is a new generation: the value of the old age is. Therefore, the larger the ratio setting, the larger the space in the old age, and the smaller the corresponding new generation space.
Setting NewRatio is not important, but it may seriously affect the overall GC performance. If the new generation is too small, the object will be transferred to the old generation, resulting in frequent Full GC, resulting in more time consumption.
You may simply think that setting NewRatio = 1 will bring the best effect, but not so. Setting NewRatio to 2 or 3 makes GC performance easier. Of course, I have encountered some such examples.
What is the fastest way to complete GC optimization? Comparing the performance test results is the fastest way to obtain the GC optimization results. By setting different options for each server and observing the GC status, it is best to observe the data for one to two days. If GC optimization is performed through performance testing, the same load and business operations should be prepared for each server. The allocation of the request ratio must also be consistent with the business conditions. However, it is not easy for a professional performance tester to prepare precise load data. it usually takes a lot of effort to prepare. Therefore, a simpler GC optimization method is to prepare GC options for business applications, and then wait for the GC results for analysis, although it may take a longer wait time.
Analysis of GC optimization results
After applying the GC option and setting-verbosegc, you can run the tail command to check whether logs are output normally as expected. If the options are not set accurately or are not output as expected, your time will be wasted. If the log output matches the expectation, you can check and analyze the results after 1 to 2 days of running. The simplest way is to copy the log file to the local PC and useHPJMeterFor analysis.
During the analysis, we mainly focus on the following data. The following lists are listed based on my own priorities. The most important data that determines the GC option is the Full GC execution time.
- Full GC (average) elapsed time
- Minor GC (average) time consumed
- Full GC execution interval
- MinorGC execution interval
- Full GC overall time consumption
- Minor GC overall time consumption
- GC overall time consumption
- Full GC execution count
- Minor GC execution times
If you are lucky enough, you can find the appropriate GC options. generally you are not so lucky. Be careful when performing GC optimization, because if you perform GC optimization once, the result may be an OutOfMemoryError.
Optimization Case
The above discussion on GC optimization is only on paper. now let's look at some specific GC optimization cases.
Case 1
This example is GC optimization for service S. It takes too much time to execute Full GC for the new service S.
Let's take a look at the jstat-gcutil results:
S0 S1 E O P YGC YGCT FGC FGCT GCT12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993
Don't worry too much when starting optimizationPermanent generationThe value of YGC is worth noting.
From the above results, we can calculate the average time overhead for executing Minor GC and Full GC, as shown in the following table:
Table 3: Average time consumption of Minor GC and Full GC in service S
GC type |
GC execution times |
GC execution time |
Average time consumed |
Minor GC |
54 |
2.047 |
37 MS |
Full GC |
5 |
6.946 |
1389 MS |
For Minor GC,37 MSNot bad, while the average Full GC time is1.389 sFor the system, the execution of Full GC may cause frequent timeouts. for example, if the DB timeout is set to 1 s, a timeout occurs. Therefore, the system in this case needs to be optimized by GC.
Check the current memory settings before you start GC optimization. You can use the jstat-gccapacity option to view memory usage. The following is the check result of service S:
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC PGCMN PGCMX PGC PC YGC FGC212992.0 212992.0 212992.0 21248.0 21248.0 170496.0 1884160.0 1884160.0 1884160.0 1884160.0 262144.0 262144.0 262144.0 262144.0 54 5
The key data is as follows:
- New generation: 212,992 KB (about 208 MB)
- Old: 1,884,160 KB (about 1.8 GB)
In addition to the persistent generation, the memory is allocated to 2 GB, and the new generation: the old generation: (NewRatio = 9 ). In order to see more detailed information, The-verbosegc option is set for the three different implementations of the system, and the NewRatio option is set separately. In addition, no other options are added.
- NewRatio = 2
- NewRatio = 3
- NewRatio = 4
When GC is checked one day later, logs are lucky to occur. no Full GC occurs after NewRatio is set.
What happened? Because most objects are destroyed shortly after they are created, objects in the new generation are destroyed before they are moved to the old age.
In this case, there is no need to set other options, just select the best NewRatio.How to select the optimal NewRatio? The average time consumed by Minor GC can only be analyzed individually when different NewRatio values are set.
The average time consumed by Minor GC corresponding to the preceding three NewRatio settings is as follows:
- NewRatio = 2: 45 ms
- NewRatio = 3: 34 ms
- NewRatio = 4: 30 ms
When NewRatio = 4, Minor GC has the minimum time consumption, so it is the best setting we choose, even if the space of the new generation is relatively small. After this option is applied, the service no longer has Full GC.
The following figure shows the result of jstat-gcutil one day after the system has reset the options:
S0 S1 E O P YGC YGCT FGC FGCT GCT8.61 0.00 30.67 24.62 22.38 2424 30.219 0 0.000 30.219
You may think that the GC frequency is low because the system receives too few requests, but the system does not have Full GC when the Minor GC executes 2,424 times.
Case 2
The following is an example of service. On the company's Application Performance management platform (APM: Application Performance Manager), we found that the JVM of service A experienced A long pause (no response in more than 8 seconds. Therefore, we decided to perform GC optimization. After investigation, we found that the system was too time-consuming to execute Full GC and needed optimization.
Before optimization, we added the-verbosegc option to the system. the output result is as follows:
: GC time consumption before GC optimization
It is the time-consuming diagram of system GC running with JVM provided after the automatic analysis result of HPJMeter.X-axisIs the running Timeline of JVM after startup,Y-axisIs the response time of each GC. The green color indicates the duration of CMS garbage collection for Full GC, and the blue color indicates the duration of Parallel Scavenge garbage collection for Minor GC.
As I mentioned earlier, CMS GC is the fastest, but it can be seen that it takes up to 15 seconds for a scenario.What causes such consequences?Recall what I said earlier: CMS slows down when the memory is compressed. In addition, The-Xms1g and-Xmx4g options are set for service A, and the memory allocated by the operating system is 4 GB.
Then I changed the GC type from GMS to Parallel GC, set the memory size to 2 GB, and set NewRatio to 3. After a period of time, you can view the following results through jstat-gcutil:
S0 S1 E O P YGC YGCT FGC FGCT GCT0.00 30.48 3.31 26.54 37.01 226 11.131 4 11.758 22.890
The Full GC speed is improved. compared with the 15 seconds of 4 GB memory, the average time is 3 seconds. However, three seconds are still unsatisfactory, so I designed the following six sets of options:
- -XX: + UseParallelGC-Xms1536m-Xmx1536m-XX: NewRatio = 2
- -XX: + UseParallelGC-Xms1536m-Xmx1536m-XX: NewRatio = 3
- -XX: + UseParallelGC-Xms1g-Xmx1g-XX: NewRatio = 3
- -XX: + UseParallelOldGC-Xms1536m-Xmx1536m-XX: NewRatio = 2
- -XX: + UseParallelOldGC-Xms1536m-Xmx1536m-XX: NewRatio = 3
- -XX: + UseParallelOldGC-Xms1g-Xmx1g-XX: NewRatio = 3
Which one will be faster? The smaller the result, the faster the speed. Is the GC duration distribution chart of the sixth group of options, representing the optimal GC performance improvement. As shown in the figure, the slowest result is 1.7 seconds, and the average value is reduced to less than 1 second.
: GC duration after the sixth option is used
Therefore, I changed the GC option of service A to the setting in Group 6, but an OutOfMemoryError occurred consecutively every night. In short, it is a batch data processing task that causes JVM memory leakage. So far, all the problems have been clarified.
It is very dangerous to apply the GC tuning result to all servers for a short period of observation. Be sure to remember that if GC optimization can be performed smoothly without any faults, there is only one way: to analyze every service operation of the system like analyzing GC logs.
The above two GC optimization cases demonstrate the specific process of GC optimization. As I said, the GC options in this case can be applied to services with the same CPU, operating system, JDK Version, and the same functions without any adjustment. However, do not apply these options to your system, because they do not necessarily apply.
Summary
I perform GC optimization based on experience without having to perform detailed analysis on the memory after heap dump, although the precise memory status may bring better GC optimization results. In general scenarios, if the memory load is low, it may be better to analyze the memory objects. However, if the service load is high and the memory space is used too much, we recommend GC optimization based on experience.
I have tested the performance of G1 GC on some services, but it is not fully used yet. The results show that the execution speed of G1 GC is faster than that of any other GC, but you need to upgrade JDK to JDK 7 to enjoy the performance improvement brought about by G1, in addition, the stability of G1 is not completely guaranteed yet. no one knows whether it will bring serious bugs. So it is still time to use G1 in a wide range.
After JDK 7 is stable (not to mention its current instability) and WAS is optimized for JDK 7, G1 may run stably on the server, at that time, GC optimization may no longer be required.
For more details about GC optimization, you can search for related materials on Slideshare. What I recommend most is that the Twitter engineer Attila Szegedi wrote this article about JVM tuning I learned on Twitter. if you have time to study it.
Author: Sangmin Lee, senior performance lab engineer, NHN company