Java garbage collection optimization tutorial and example

Source: Internet
Author: User
Tags manual garbage collection

Java garbage collection optimization compared with other performance optimization activities, you must first ensure that you understand the entire application and the expected results, rather than simply meeting a certain part of the optimization of the application. Generally, it is easier to follow the following process:

Define your own performance goals.
Test.
Measurement optimization result.
Compare with the target.
Change the method and test again.

Performance Tuning goals are very important if they are deterministic and measurable. These goals include latency, throughput, and capacity. For more information, I recommend that you read the corresponding chapters in the Garbage Collection Handbook. Let's see how to set and achieve this optimization goal in practice. For this purpose, let's look at a sample code:

// Imports skipped for brevity
Public class Producer implements Runnable {
 
Private static ScheduledExecutorService executorService = Executors. newScheduledThreadPool (2 );
 
Private Deque <byte []> deque;
Private int objectSize;
Private int queueSize;
 
Public Producer (int objectSize, int ttl ){
This. deque = new ArrayDeque <byte []> ();
This. objectSize = objectSize;
This. queueSize = ttl * 1000;
  }
 
@ Override
Public void run (){
For (int I = 0; I <100; I ++ ){
Deque. add (new byte [objectSize]);
If (deque. size ()> queueSize ){
Deque. poll ();
      }
    }
  }
 
Public static void main (String [] args) throws InterruptedException {
ExecutorService. scheduleAtFixedRate (new Producer (200*1024*1024/1000, 5), 0,100, TimeUnit. MILLISECONDS );
ExecutorService. scheduleAtFixedRate (new Producer (50*1024*1024/1000, 120), 0,100, TimeUnit. MILLISECONDS );
TimeUnit. MINUTES. sleep (10 );
ExecutorService. shutdownNow ();
  }
}

Two jobs are submitted in the code and run every 100 ms. Every job simulates the lifecycle of a specific object: first create an object, let them "survive" for a period of time, then forget them, and let GC reclaim memory. When running this example, enable GC logs and use the following parameters:
1
    
-XX: + PrintGCDetails-XX: + PrintGCDateStamps-XX: + PrintGCTimeStamps

We immediately see the GC effect in the log file similar to the following:

2015-06-04T13: 34: 16.119-0200: 1.723: [GC (Allocation Failure) [PSYoungGen: 114016 K-> 73191 K (234496 K)] 421540 K-> 421269 K (745984 K), 0.0858176 secs] [Times: user = 0.04 sys = 0.06, real = 0.09 secs]
2015-06-04T13: 34: 16.738-0200: 2.342: [GC (Allocation Failure) [PSYoungGen: 234462 K-> 93677 K (254976 K)] 582540 K-> 593275 K (766464 K), 0.2357086 secs] [Times: user = 0.11 sys = 0.14, real = 0.24 secs]
2015-06-04T13: 34: 16.974-0200: 2.578: [Full GC (Ergonomics) [PSYoungGen: 93677 K-> 70109 K (254976 K)] [ParOldGen: 499597 K-> 511230 K (761856 K)] 593275 K-> 581339 K (1016832 K), [Metaspace: 2936 K-> 2936 K (1056768 K)], 0.0713174 secs] [Times: user = 0.21 sys = 0.02, real = 0.07 secs]

Based on the information in the log, we can begin to improve performance. Remember three different goals:

Make sure that the worst case of GC pause does not exceed the expected critical value.
Ensure that the application thread downtime does not exceed the predefined threshold value.
Reduce infrastructure costs while ensuring that we can still achieve reasonable latency and throughput goals.

For this reason, the three different configurations are run for 10 minutes, and the following table summarizes the results of the three large gaps:
Pause the effective working duration of the heap GC algorithm
-Xmx12g-XX: + UseConcMarkSweepGC 89.8% 560 MS
-Xmx12g-XX: + UseParallelGC 91.5% 1,104 MS
-Xmx8g-XX: + UseConcMarkSweepGC 66.3% 1,610 MS

In the experiment, set different GC algorithms and different heap sizes, run the same code, and then measure the duration and throughput of the garbage collection pause. The experiment details and results are explained in our garbage collection manual. Let's take a look at some examples in the manual. Modifying some simple configurations will result in different performance aspects, such as latency and throughput.

Note: To keep the example as simple as possible, only a limited number of input parameters are changed, for example, different numbers of cores (CPU cores) or different heap la s are not tested.




Java garbage collection optimization


1. Materials

Optimization of JDK5.0 garbage collection-Don't Pause)
Compile code that is GC friendly and does not leak (for years)
JVM tuning summary
All JDK 6 options and default values

2 GC log printing

GC tuning is a very hands-on task of Galileo. GC logs are a prerequisite for data reference and final verification:

-XX: + PrintGCDetails-XX: + PrintGCTimeStamps (GC occurrence time)-XX: + PrintGCApplicationStoppedTime (GC consumed time)-XX: + PrintGCApplicationConcurrentTime (how long does the GC run)

 
3. Collector selection


CMS collector: pause time first

Configuration parameters:-XX: + UseConcMarkSweepGC
Parameters not required by default:-XX: + UseParNewGC (Parallel collection of new generation)-XX: + CMSPermGenSweepingEnabled (CMS collection of persistent generation)-XX: UseCMSCompactAtFullCollection (compressing the old generation when full gc is used)

Initial effect: the new generation of 1 GB heap memory is about 60 MB, minor gc is about 5-20 milliseconds, and full gc is about 130 milliseconds.
Parallel collector: throughput first

Configuration parameter:-XX: + UseParallelGC-XX: + UseParallelOldGC (Parallel collects the old generation, which is supported since JDK6.0)

The parameter-XX: + UseAdaptiveSizePolicy (dynamically adjust the new generation size) is not required by default)

Initial effect: the new generation of 1 GB heap memory is about 90-110 MB (dynamically adjusted), minor gc is about 5-20 milliseconds, and the full gc has UseParallelOldGC parameters of 1.3/1.1 seconds, respectively, with little difference.

In addition,-XX: MaxGCPauseMillis = 100 sets the expected maximum time of minor gc. JVM will adjust the size of the new generation. However, in this test environment, the object is too fast to die, so this parameter has little effect.
4. Tuning practices

Parallel collects paused time up to 1 second is basically intolerable, so select CMS collector.

In the Mule 2.0 application under stress testing, there are about MB of short-lived objects generated every second:

Because the new generation of 60 m by default is too small, frequent occurrence of minor gc occurs, it takes about 0.2 seconds.
In the CMS collector, MaxTenuringThreshold (the number of times that the generation object has been supported by minor gc before entering the old generation setting) defaults to 0, and the surviving temporary object directly enters the old generation without going through the same vor area, it was not long before full gc occurred in the old generation.

The optimization of these two parameters should not only improve the above two cases, but also avoid too many new generations. Too many replications may cause the pause time of minor gc to be too long.

Use-Xmn to adjust the total memory to 1/3. After observation, set-Xmn500M, and the new generation is actually about 460 m. (-XX: The NewRatio setting is invalid. Only-Xmn can be used ).
Add the-XX: + PrintTenuringDistribution parameter to observe the total object size of each Age, and then set-XX: MaxTenuringThreshold = 5.

After optimization, a minor gc occurs in about 1.1 seconds, and the speed remains between 15-20 ms. At the same time, the growth rate of the old generation has slowed down significantly, and a full gc occurred for a long time,

Final parameters:

-Server-Xms1024m-Xmx1024m-Xmn500m-XX: + UseConcMarkSweepGC-XX: MaxTenuringThreshold = 5-XX: + ExplicitGCInvokesConcurrent

 

Finally, the service processing speed has increased from 1180 tps to 1380 tps. It is very cost-effective to adjust the two parameters to improve the performance by 17%.


In addition, JDK6 Update 7 comes with a VisualVM tool, which is the Netbean Profiler that has been used before. It can be used like JConsole to view the thread status, an important reference for optimizing the CPU time of objects and methods in the memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.