Become a Java GC expert (3)-How to optimize the Java garbage collection mechanism

Last Update:2015-04-19 Source: Internet

Author: User

Tags xms

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is the third in a series of articles for the Java GC expert. In the first article, "Becoming a JAVAGC expert Part I-Java garbage collection mechanism" we learned about the execution of different GC algorithms, how the GC works, what is the new generation and the old age, what you should know about the 5 GC types in JDK7, and the impact of these 5 types on application performance.

In the second article, how to become a JAVAGC expert part ii-How to monitor the Java garbage collection mechanism, I explained how the JVM actually performs garbage collection, how we monitor the GC, and which ones can make our work faster and more efficient. In the third article, we'll explain some of the best practices for optimizing GC based on practical examples. I think before reading this article, you have a good understanding of the previous article, so, in order for you to better study this article, if you have not read the previous two articles, please read first.

Why optimization is needed GC

Or more precisely, is it necessary to optimize the GC for Java -based services ? It should be said that for all Java-based services, GC optimization is not always required, but only if the Java-based system that is running contains the following parameters or behavior:

Memory size has been set through-XMS and –XMX
Contains the-server parameter
There are no error logs such as timeout logs in the system

in other words, if you do not set the size of the memory, and the system is flooded with a large number of timeout logs, you need to do in your system GC optimized.

However, you need to always remember one :GC optimization is always the last task.

Think of the most fundamental reason for GC optimization, the garbage collector cleans up objects created in Java programs, the number of times the GC executes is the number of objects that need to be cleaned up by the garbage collector, and the number of objects created, so first you should reduce the number of objects created .

As the saying goes, "Rome Cold Day". We should start from the small things, otherwise it will be difficult to manage over the accumulated time.

We need to use StringBuilder or StringBuffer instead of string
Output logs should be as few as possible

However, we know that there are situations where we are helpless, and we are watching XML and JSON parsing consume a lot of memory. Even though we have used as few string and as few output logs as possible, a large amount of temporary memory is used for XML or JSON parsing, such as 10-100MB. However, it is difficult to discard XML and JSON. As long as we know, he will occupy a lot of memory.

If the application memory usage has been improved after several repetitions, you can start GC optimization.

I've summed up two goals for GC optimization:

One is to minimize the number of objects transferred to the old age.
The other is to reduce the execution time of the full GC

To minimize the number of objects transferred to the old age

The surrogate GC mechanism is provided by the Oracle JVM and does not include the G1 GC that can be used in JDK7 and later versions. In other words, objects are created in the Eden space and then converted to survivor space, and eventually the remaining objects are sent to the old age. Some of the larger objects will be transferred directly to the old age space after being created in the Eden space. GC processing in the old age space will take more time for the new generation. Therefore, reducing the data that is moved to the old age object can significantly reduce the frequency of the full GC. Reducing the number of objects that are moved to the old age space can be misinterpreted as leaving objects in the new generation. However, this is not possible. Instead, you can adjust the size of the Cenozoic space.

reduced Full GC Execution Time

Full GC has a much longer execution time than minor GC. Therefore, if the full GC spends too much time (more than 1 seconds), some connected parts may have a time-out error.

If you try to reduce the execution time of full GC by reducing the age of the old space, the number of OutOfMemoryError or full GC executions may increase.
Conversely, if you try to reduce the number of full GC executions by increasing older generation space, the execution time increases.

Therefore, you need to set the old generation space to an "appropriate" value.

Influence GC Parameters for Performance

As we mentioned at the end of the second article, do not fantasize about "the performance of someone who has set the GC parameter is greatly improved, why don't we use the same parameters?" because different Web services create objects that vary in size and their life cycle.

To put it simply, if a task's execution condition is a,b,c,d and E, the same task execution conditions are changed to a and B, which one will you think is faster? From the general human intuition, the tasks performed under A and B are faster.

Java GC parameters are the same, setting some parameters not only does not increase the speed of GC execution, but may cause him to be slower. GC The most basic principle of optimization is to use different GC parameters for 2 or more servers and compare them, and apply those parameters that have proven to improve performance or reduce GC execution time to the server. Keep this in mind.

The following table lists the parameters in the GC parameter that are related to memory size and can affect performance.

Table 1 : GC optimization needs to be considered. Java Parameters

Defined	Parameters	Describe
	-xms	heap area siz e when the starting JVM
	-xmx	maximum heap area size< /p>
Cenozoic space	-xx:newratio	ratio of the New area and the old section The new generation and the age ratio
& nbsp;	-xx:newsize	new area size Cenozoic space
	-xx:survivorratio	ratio Ofedenarea and Survivo R area Eden Space and survivor space ratio

I often use-xms,-xmx and-xx:newratio when doing GC optimizations. -xms and-xmx are necessary. How you set Newratio can have a significant impact on GC performance. Some people may ask how to set the size of the Perm area? You can set it by-xx:permsize and-xx:maxpermsize parameters,

When a outofmemoryerror error occurs and is caused by insufficient perm space, another parameter that may affect GC performance is the GC type. The following table lists all the optional GC types (based on JDK6.0)

Table 2 : GC Type Optional Parameters

Classification	Parameters	Preparation
Serial GC	-xx:+useserialgc
Parallel GC	-xx:+useparallelgc -xx:parallelgcthreads=value
Parallel Compacting GC	-xx:+useparalleloldgc
CMS GC	-xx:+useconcmarksweepgc -xx:+useparnewgc -xx:+cmsparallelremarkenabled -xx:cmsinitiatingoccupancyfraction=value -xx:+usecmsinitiatingoccupancyonly
G1	-xx:+unlockexperimentalvmoptions -xx:+useg1gc	In JDK6, both parameters must be used in conjunction with the

In addition to the G1 GC, you can toggle the GC type by the parameters of the first row of each type. The most common GC type is the serial GC. He is specifically optimized for client systems.

There are many parameters that affect GC performance, but the parameters mentioned above bring the most significant results. Keep in mind that setting too many parameters does not necessarily reduce the GC execution time.

GC Optimization Process

The GC optimization process is similar to the process of most performance improvements. Here is the GC optimization process I used.

1. monitor GC status

First you need to monitor the GC to check the various states of the GC during system execution. Please refer to the monitoring method mentioned in the previous article to become a JAVAGC expert part ii-How to monitor the Java garbage collection mechanism.

2. After analyzing the monitoring results, decide whether to perform GC optimization

As you examine the GC status, you should analyze the monitoring results to determine if GC optimization is performed, and if the analysis shows that the GC is only 0.1-0.3 seconds away, then you don't need to waste time doing GC optimizations. However, if the GC's execution time is 1-3 seconds, or more than 10 seconds, the GC will be imperative.

However, if you have already allocated 10GB of memory for Java and can no longer reduce the memory size, you will no longer be able to optimize the GC. Before you do GC optimization, you have to figure out why you want to allocate so much memory space. If outofmemoryerror occurs when you divide 1 GB or 2 GB of memory, you should perform a heap memory dump and eliminate the risk.

Attention:

A heap memory dump is a file that is used to examine objects and data in Java memory. The file can be created by executing the JMAP command in the JDK. During the creation of the file, the Java program pauses, so do not create the file during system execution.

You can search the Internet for detailed descriptions of heap memory [S1] dumps. For Korean readers, refer to the book I published last year: The story of Troubleshooting for Java developers and system operators (Sangmin Lee, Hanbit Media, 201 1, 416 pages).

3. Adjust GC type / memory Space

If you have already decided to perform GC optimization, then you should select the GC type and set the memory space. At this point, if you have several different servers, always remember to check the GC parameters of each server and make targeted optimizations.

4. Analysis Results

After adjusting the GC parameters and continuously collecting for 24 hours, the results are analyzed, and if you're lucky, you'll find the GC parameters that best suit your system. Instead, you need to analyze the logs to check how the memory is allocated. Then you need to find the best parameters by constantly adjusting the GC type and the size of the memory space.

5. If the results are satisfactory, you can apply this parameter to all servers and stop GC Optimizations

Having had GC optimization results satisfactorily, you can apply to all the servers, and in the following chapters we will see the specific tasks for each step.

Monitoring GC Status and analysis results

The best way to view the GC state of a running Web application Server (was) is through the jstat command, which becomes the JAVAGC expert part ii-in the second article How to monitor the Java garbage collection mechanism I have explained the jstat command in detail, so I'll focus on the data section in this article.

The following example shows the state of a JVM before it is optimized for GC.

(Unfortunately, this is not an operations server)

1234	`$ jstat -gcutil 21719 1sS0 S1 E O P YGC YGCT FGC FGCT GCT48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.67348.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673`

As in the table above, let's take a look at YGC and YGCT, calculate YGCT/YGC get 0.050 seconds (50 milliseconds). This means that the GC operation on the Cenozoic space takes an average of 50 milliseconds. In this case, you don't have to worry about the GC operations performed on the Cenozoic space.
Next, let's take a look at FGCT and FGC. , the calculation fgct/fgc gets 19.68 seconds, which means that the GC's average execution time is 19.68 seconds, which may be three times per 19.68 seconds, or two executions in 1 seconds and another 58 seconds. In either case, GC optimization is required.

The GC status can be easily viewed with the jstat command, but the best way to parse the GC is to generate the log through the –VERBOSEGC parameter, which I explained in the previous article about how to parse the logs,hpjmeter Is my personal favorite tool for analyzing-VERBOSEGC logs. He is easy to use and analyze results. With Hpjmeter you can easily see the GC execution time and the frequency of GC occurrences. If GC execution time satisfies all of the following conditions, it means that GC optimization is not required.

Minor GC execution is fast (less than 50ms)
Minor GC does not perform frequently (about 10 seconds at a time)
Full GC executes very quickly (less than 1s)
Full GC does not perform frequently (10 minutes at a time)

The numbers mentioned above are not absolute; they differ depending on the state of the service, and some services may be satisfied with the full GC's speed of 0.9 seconds at a time, but others may not. Therefore, different values are set for different services to determine whether GC optimization is performed.

There is one thing you need to pay particular attention to when viewing GC status, which is to not just focus on the execution time of the minor GC and full GC. Also pay attention to the number of GC executions, for example, when the Cenozoic space is small, the Minor GC will execute too frequently (sometimes more than 1 times per second). In addition, an increase in the number of objects transferred to the old age will result in an increase in the number of full GC executions. So don't forget to add the –gccapacity parameter to see how much space is being consumed.

setting GC type / Memory Space Size

setting GC type

ORACLEJVM has 5 GC types, but in the previous version of JDK7, only one of the Parallel GC, Parallel compacting GC, and CMS GC can be selected, and there is no clear rule for which to choose.

So, How do we choose? It is strongly recommended that all three are chosen, but one thing is clear: CMS GC is faster than parallel GCS. If that's the case, then choose the CMS GC. However, the CMS GC is not always faster. Overall, full GC execution is faster in CMS GC mode, however, he will be slower than parallel GC in the event of a parallel mode failure.

concurrency mode failed

Let's explain in detail the concurrency mode failure.

The biggest difference between Parallel GC and CMS GC comes from compression tasks. The compression task cleans up memory fragmentation by removing empty space in the allocated memory space to compress memory.

In parallel GC mode, compression is performed at full GC execution, which is a lot of time, but after the full GC has been executed, the memory can be allocated more quickly because of the sequential allocation of space.

In contrast, the CMS GC does not compress processing, so the CMS GC executes faster. However, because there is no compression, there is a lot of white space in memory before Disk Cleanup. This means that there may not be enough space to store large objects, for example, although the old age space has 300MB space, some 10MB objects cannot be stored sequentially. In this case, a "parallel mode failure" warning appears and the compression process is performed. In CMS GC mode, compression processing takes much longer to execute than parallel GCS. In addition, this will lead to another problem. For a detailed description of the failure of the concurrency mode, refer to the understanding CMS GC Logs written by the Oracle engineer.

In summary, you need to find the type of GC that best suits your system.

Each system has the most suitable GC type waiting for you to look for if you have 6 servers. I recommend that you set the same parameters for each of the two sets. and add the –VERBOSEGC parameter, analyze the result.

Set the amount of memory space

The following table shows the relationship between the size of the memory space, the number of GC executions, and the GC execution time.

Large Memory space
- Reduce the number of GC executions
- Increase GC Execution time
Small Memory space
- Reduce GC Execution time
- Increase the number of GC executions

There is no single standard answer on how to set the size of the memory space. If the server resources are sufficient and the full GC can be completed in 1 seconds, it is possible to set to 10GB. However, most servers do not, and when the memory is set to 10GB, it may take 10-30 seconds to execute the full GC. Of course, the execution time changes depending on the size of the object.

In view of this, how should we set the size of the memory space? in general, I recommend 500MB. Note, however, that this is not to allow you to set the memory parameters of was to –xms500m and –xmx500m. Depending on the state before the GC is optimized, if the full GC executes after the memory space remaining 300MB, then it is best to set the memory to 1GB (300MB (default program occupancy) + 500MB (the old age minimum Space) +200MB (free memory)). That means you have to set up 500MB for the old age. Therefore, if you have three execution servers, the memory is set to 1GB,1.5GB,2GB, and the results are checked.

In theory, GC execution speed should follow 1gb> 1.5gb> 2GB, so 1GB performs the fastest GC. However, it does not mean that the full GC of 1GB space will take 1 seconds and 2GB space will take 2 seconds. Time depends on the performance of the server and the size of the object. Therefore, the best way is to build as many metrics as possible to monitor them.

For the size of the memory space, you should set additional newratio parameters. The Newratio parameter is the proportion of the new generation and the old age space, that is, the xx:newratio=1 means that the ratio of the new generation to the old age is 1:1. For the 1GB is the new generation and the old age of 500MB. If Newratio is 2, it means that the ratio of the generation of older generations is 1:2, so the higher the value, the larger the old age space, the smaller the Cenozoic space.

This may seem like a less important thing, but the Newratio parameter can significantly affect the performance of the GC as a whole. If the Cenozoic space is small, more objects will be transferred to the old age space, resulting in frequent full GC, increasing the pause time.

You can simply think that Newratio is the best choice for 1, but sometimes it may be set to 2 or 3 better, and I've seen a lot of such examples.

How to get the fastest finish GC optimization? comparing the results of a performance test should be the quickest way to set different parameters for each server and monitor their status, and it is highly recommended to monitor at least 1 or 2 days of data. However, when you optimize for GC, you want to make sure that the same load is performed every time. And the requested ratios, such as URLs, should be consistent. However, even for professional testers it is difficult to control the load accurately, and it takes a lot of time to prepare. Therefore, it is relatively convenient and easy to adjust the parameters, then take a longer time to collect the results.

Analysis GC Optimization Results

After setting the GC parameter and the-VERBOSEGC parameter, ensure that the log is generated correctly by the tail command. If the parameters are set incorrectly or the logs are not generated, you will waste your time. If the log is correct, continue collecting for 1-2 days. It is then better to download the logs to your local PC and use hpjmeter to analyze

Full GC Execution Time
Minor GC Execution Time
Full GC execution interval
Minor GC Execution Interval
Entire full GC execution time
Entire Minor GC Execution time
Entire GC Execution time
Full GC e Execution time
Minor GC Execution Time

Finding the best GC parameters is a very fortunate thing, but on most occasions we do not get the blessing of a lucky God, as carefully as possible in GC optimization, and the need to complete optimization in one step often leads to outofmemoryerror.

Optimization examples

Well, we've been on paper, and now we're looking at some examples of actual GC optimizations.

Example 1

The following example is for service S optimizations, which took too long for the recently deployed service S,full GC.

Take a look at the results of Jstat–gcutil's execution.

12	`S0 S1 E O P YGC YGCT FGC FGCT GCT12.160.00` `5.18` `63.78` `20.32` `54` `2.047` `5` `6.9468.993`

The leftmost Perm space is not important for the initial GC optimization, and this time the value of the YGC parameter is more useful.

The average value of the Minor GC and full GC is shown in the following table

Table 3 : Service S of the Minor GC and the Full GC The average execution time

GC type	Number of GC executions	GC Execution Time	Average
Minor GC	54	2.047	Panax Notoginseng ms
Full GC	5	6.946	1,389 s

The most important is the following two data

New Generation actual usage space: 212,992 KB
Old age actual usage space: 1,884,160 KB

Therefore, the total memory space is 2GB, does not calculate the perm space, the new generation and the old age ratio is 1:9. Data is collected through jstat and-VERBOSEGC logs, and three servers are set up in the following manner.

newratio=2
Newratio=3
Newratio=4

After one day, after checking the system's GC log, it was found that, after setting the Newratio parameter, it was fortunate that no full GC had occurred.

Why?

Newratio=2:45 ms
Newratio=3:34 ms
Newratio=4:30 ms

We see that the newratio=4 is the best parameter, although its Cenozoic space is the smallest, but the GC time is indeed the shortest. After setting this parameter, the system has not performed a full GC.

To illustrate this, here is the result of the service star performing jstat–gcutil after a period of time

12	`S0 S1 E O P YGC YGCT FGC FGCT GCT8.610.00` `30.67` `24.62` `22.38` `2424` `30.219` `0` `0.00030.219`

You might think that the frequency of GC execution decreases because the server accepts fewer requests. In fact, although the full GC was not executed, the minor GC was executed 2,424 times.

Example 2

This is an example of ServiceA, where we discovered that the JVM was paused for a long time (more than 8 seconds) through the in-House Application Performance Management System (APM), so we did GC optimizations. We found out why the full GC was taking too long to execute, and we started to fix it.

The first step in GC optimization is that we add the-VERBOSEGC parameter and get the following result.

Figure 1 : To GC before the optimization STW Time

As shown, one of the automatically generated pictures by Hpjmeter. The x-coordinate represents the time the JVM executes. The y-coordinate represents the time of each GC. A CMS green dot that represents the full GC result. Parallel scavenge Blue Dot, indicating minor GC results.

I have said before that CMS GC is the fastest, but the above results show that for some reason, it took up to 15 seconds. what is causing the result? If you think of what I mentioned before, the CMS slows down when it comes to memory cleanup. At the same time, the memory of the service is set to –XMS1G and –xmx4g, and 4GB of memory is actually allocated.

Therefore, I changed the GC type from CMS to parallel GC. and change the memory to 2GB, set Newratio to 3. A few hours later I used Jstat–gcutil to get the following results

12	`S0 S1 E O P YGC YGCT FGC FGCT GCT0.0030.48` `3.31` `26.54` `37.01` `226` `11.131` `4` `11.75822.890`

The full GC turns to an average of 3 seconds per 15 seconds compared to 4GB. But 3 seconds is slower, so I designed the following 6 scenarios.

Case 1:-xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=2
Case 2:-xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=3
Case 3:-xx:+useparallelgc-xms1g-xmx1g-xx:newratio=3
Case 4:-xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=2
Case 5:-xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=3
Case 6:-xx:+useparalleloldgc-xms1g-xmx1g-xx:newratio=3

which one is the quickest? the results show that the smaller the memory, the better the result. Shows the results of the CASE6. This is the best performance for GC. The maximum response time is only 1.7 seconds. The average time is within 1 seconds.

Figure 2 : Case6 the time chart

Based on the results above. We adjusted the GC parameters according to CASE6. However, this has led to outofmemoryerror every night. It is difficult to explain the specific reasons here. In simple terms, a batch program causes a memory leak. The related issues have been resolved.

It is dangerous to optimize all servers if the GC logs are analyzed in a short time. Keep in mind that you must analyze GC logs and applications at the same time.

We reviewed two examples of GC optimizations, as I mentioned earlier, the GC parameters mentioned in the example can be set on the same server, but only if they have the same CPU, operating system, JDK version, and running the same service. But do not directly use the parameters I used to your service-oriented, they may not be able to work very well.

Conclusion

I did GC optimizations with experience without performing heap dumps and analyzing the details of the memory. Accurate analysis of memory results in better optimization. However, this analysis generally applies to situations where memory usage is relatively fixed. However, if the service is heavily overloaded and occupies a large amount of memory, it is strongly recommended that GC optimizations be based on previous experience.

I have set G1 GC parameters on some services and have performed performance tests. But there is no application to the formal environment, G1 GC parameters faster than any other GC type. However, you have to upgrade to JDK7. In addition, his stability is temporarily not protected, no one knows whether there will be fatal errors. So it's not time to formally apply it.

One day in the future, when JDK7 is really stable (which is not to say he is not stable now) and was optimized for JDK7, the G1 GC will eventually work as expected, and we may not need to do GC optimization.

For more information on GC optimization, please log in to slideshare.com to view the associated resources. It is highly recommended that everything I ever learned about JVM performance Tuning @Twitter. Author Attila Szegedi, a Twitter engineer. Please take some time to read.

by Sangmin Lee, NHN performance Engineering Lab.

Author Sangmin Lee, worked at NHN Performance Engineering Institute

English Original: Cubrid, compilation: importnew-Wang Xiaojie

Address: http://www.importnew.com/3146.html

Become a Java GC expert (3)-How to optimize the Java garbage collection mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More