JVM Performance Tuning

Source: Internet
Author: User
Tags gc overhead limit exceeded

excerpt from: http://uule.iteye.com/blog/2114697

Summary of JVM garbage collection and performance tuning

Several strategies for tuning the JVM



First, JVM memory model and garbage collection algorithm

1. According to the Java Virtual Machine specification, theJVM divides the memory into:

    • New (Young Generation)
    • Tenured (old generation)
    • Permanent generation (Perm)

Where new and tenured belong to heap memory, heap memory is allocated from the memory specified by the JVM startup parameters (-xmx:3g),Perm is not part of the heap memory and has a virtual machine assigned directly , but can be -xx:permsize-xx by: MaxPermSize and other parameters to adjust its size.

    • Young Generation (NEW): young generation used to store Java objects just allocated by the JVM
    • Older generation (tenured): objects that have not been reclaimed by garbage collection in the young generation will be copied to the old generation
    • Permanent generation (Perm): Permanent storage class, method meta-information, the size of the project scale, class, method of the amount of relevant, generally set to 128M is sufficient, set the principle is to reserve 30% of space.

New is divided into several parts:

    • Eden:Eden is used to store objects just allocated by the JVM
    • Survivor1
    • Survivro2: as large as two survivor spaces , when objects in Eden are not reclaimed by garbage collection, they are copied back and forth between two survivor when a certain condition, such as copy count, is met. will be copy to tenured. Obviously,survivor only increases the duration of the object's stay in the younger generation, increasing the likelihood of garbage collection.

2. Garbage collection algorithm

Garbage collection algorithms can be divided into three categories, all based on the tag-purge (copy) algorithm:

    • Serial algorithm (single thread)
    • Parallel algorithms
    • concurrency algorithm

Depending on the hardware configuration of the machine, the JVM chooses the appropriate recovery algorithm for each memory generation, for example, if the machine has more than 1 cores, the parallel algorithm is selected for the younger generation, and the selection details refer to the JVM tuning documentation.

Slightly explained, the parallel algorithm is a multi-threaded garbage collection, the recovery period will suspend the execution of the program, while the concurrency algorithm, but also multi-threaded recycling, but does not stop the application execution. therefore, the concurrency algorithm is suitable for some programs with high interactivity. After observation, concurrency algorithms reduce the size of younger generations by using a large old generation, which in turn is relatively low throughput compared to parallel algorithms.

Another question is, when does a garbage collection action execute?

    • When the young generation of memory is full, a normal GC is raised, and the GC recycles only the younger generation. when you need to emphasize, the younger generation is full of Eden, Survivor is full and does not cause GC
    • The full gc,full GC will be collected at the same time as the old generation.
    • Full GC will also be thrown when the permanent fill will cause the unload of class and method meta information

Another problem is when OutOfMemoryException is thrown, not when the memory is empty

    • jvm98% time is spent in memory recycling
    • Less than 2% of memory per recycle

Satisfying these two conditions will trigger OutOfMemoryException, which will leave a tiny gap in the system to do something down, such as manually printing heap Dump.

Second, memory leak and solution method

1. Some phenomena before the system crash:

    • Each time the garbage collection is getting longer, the previous 10ms has been extended to about 50ms, and the FULLGC time has been extended to 4 0.5s, 5s
    • FULLGC more and more times, the most frequent lapse of less than 1 minutes to do a FULLGC
    • Older generations of memory are getting bigger and older generations have no memory released after each FULLGC

The system will not be able to respond to new requests and gradually reach the outofmemoryerror threshold.

2. Piles of dump files

The current heap information is generated from the JMX Mbean, the size of a 3G (the entire heap size) of the Hprof file, and if not started JMX can be generated by the Java Jmap command to generate the file.

3. Analyze the dump file

The following is to consider how to open this 3G heap information file, it is clear that the general window System does not have such a large amount of memory, must rely on high-configuration Linux. Of course we can use X-window to import graphics from Linux into window. We consider opening the file with the following tools:

    1. Visual VMS
    2. IBM Heapanalyzer
    3. The Hprof tool that comes with the JDK

In order to ensure load speed when using these tools, it is recommended to set the maximum memory to 6G. Using these tools, it is not possible to visually observe a memory leak, but the Visual VM can observe the object size but not the call stack; Heapanalyzer can see the call stack, but cannot open a 3G file correctly. As a result, we chose Eclipse's dedicated Static memory analysis tool: Mat.

4. Analyzing memory leaks

With the mat we can clearly see which objects are suspected to be memory leaks, which objects occupy the most space and the object's invocation relationship. In view of this case, there are many jbpmcontext instances in threadlocal, and the JBPM context is not closed due to investigation.

Also, we can analyze the thread state through Mat or JMX, and we can see which object the thread is blocked on, thus judging the bottleneck of the system.

5. Regression issues

Q: Why is the garbage collection time getting longer before the crash?

A: According to the memory model and garbage collection algorithm, garbage collection is divided into two parts: memory tag, clear (copy), Mark part as long as the memory size fixed time is constant, the change is the copy part, because each garbage collection has some recovery memory, so increase the amount of replication, resulting in longer time. Therefore, garbage collection time can also be used as a basis for judging memory leaks.

Q: Why are there more and more full GC times?

A: So the accumulation of memory, gradually exhausted the old generation of memory, resulting in new objects allocated no more space, resulting in frequent garbage collection

Q: Why are older generations taking up more memory?

A: Because the young generation of memory can not be recycled, more and more are copied to the old generation

Third, performance tuning

In addition to the above memory leaks, we also found that the CPU is less than 3% long-term, the system throughput is not enough, for 8corex16g, 64bit Linux server, is a serious waste of resources.

At a time when the CPU is under load, and occasionally the user is taking too long to reflect the request, we realize that the program and JVM must be tuned. This is done in several ways:

    • thread pool : Solving long-time user response issues
    • Connection pool
    • JVM Startup Parameters : Adjust the memory scale and garbage collection algorithm for each generation to improve throughput
    • Program Algorithm : Improve program logic algorithm to improve performance

1.Java thread pool (java.util.concurrent.ThreadPoolExecutor)

The thread pool used by most applications on JVM6 is the thread pool that comes with the JDK, and the thread pool of mature Java threads is verbose because the thread pool behaves a bit like we imagined. The Java thread pool has several important configuration parameters:

    • Corepoolsize: Number of core threads (number of latest threads)
    • Maximumpoolsize: Maximum number of threads, more than this number of tasks will be rejected, users can customize the processing through the Rejectedexecutionhandler interface
    • KeepAliveTime: The time the thread remains active
    • WorkQueue: Work queue, storing performed tasks

The Java thread pool needs to pass in a queue parameter (WorkQueue) to hold the execution of the task, and the different choices for the queue, the thread pool has a completely different behavior:

    • SynchronousQueue: 一个无容量的等待队列,一个线程的insert操作必须等待另一线程的remove操作,采用这个Queue线程池将会为每个任务分配一个新线程
    • LinkedBlockingQueue :无界队列,采用该Queue,线程池将忽略maximumpoolsize parameter, all tasks are handled only with corepoolsize threads, and the unhandled tasks areLinkedBlockingQueue中排队
    • ArrayBlockingQueue: 有界队列,在有界队列和Maximumpoolsize, the program will be difficult to tune: larger queue and small maximumpoolsize will lead to low CPU load, small queue and large pool, and the queue will not function properly.

In fact, our requirements are very simple, we hope that the thread pool can be the same as the connection pool, can set the minimum number of threads, the maximum number of threads, when the minimum number < Task < Maximum number, should be allocated new threading; When the Task > maximum number, you should wait for the idle thread to process the task.

However, the thread pool is designed so that tasks should be placed in the queue, and when the queue does not fit, consider using a new thread, and reject the task if the queue is full and the new thread cannot be derived. The design results in "First put, wait for execution", "no More Execution", "No Wait". Therefore, according to the different queue parameters, to increase the throughput can not blindly increase the maximumpoolsize.

Of course, to achieve our goal, we must encapsulate the thread pool, fortunately, Threadpoolexecutor has enough custom interfaces to help us achieve our goals. The way we encapsulate it is:

    • Use Synchronousqueue as a parameter to make maximumpoolsize work to prevent threads from being allocated indefinitely while improving system throughput by increasing maximumpoolsize
    • Customizing a Rejectedexecutionhandler, when the number of threads exceeds maximumpoolsize, is handled at intervals to check whether the thread pool can execute a new task, and if the rejected task can be put back into the thread pool, The time to check depends on the size of the KeepAliveTime.

2. Connection pool (Org.apache.commons.dbcp.BasicDataSource)

When using Org.apache.commons.dbcp.BasicDataSource, because the default configuration was used before, when the traffic was large, it was observed by JMX that many tomcat threads were blocking the Apache used in Basicdatasource Objectpool's lock, the direct reason was because the maximum number of connections for the Basicdatasource connection pool was set too small, the default Basicdatasource configuration, using only 8 maximum connections.

I also observed a problem when the system is not accessed for a longer period of time, such as 2 days, MySQL on DB will be disconnected so that the connection to the cache in the connection pool is not available. To solve these problems, we have fully studied the Basicdatasource and found some optimized points:

    • MySQL supports 100 links by default, so the configuration of each connection pool is based on the number of machines in the cluster, such as 2 servers, each set to 60
    • InitialSize: Parameter is the number of connections that have been open
    • Minevictableidletimemillis: This parameter sets the idle time for each connection, over which time the connection will be closed
    • Timebetweenevictionrunsmillis: The running period of a background thread used to detect expired connections
    • Maxactive: Maximum number of connections that can be allocated
    • Maxidle: Maximum idle number, when the connection is completed and the number of connections is greater than maxidle, the connection will be closed directly. Only InitialSize < x < Maxidle connections will be periodically checked for expiration. This parameter is primarily used to increase throughput during peak access.
    • How is the InitialSize maintained? After studying the code, Basicdatasource will close all the extended connections and then open the InitialSize number of connections, this feature with Minevictableidletimemillis, Timebetweenevictionrunsmillis together ensures that all extended initialsize connections are reconnected, thus avoiding the problem that MySQL will be disconnected for a long time without action.

3.JVM parameters

In the JVM startup parameters, you can set some parameters related to memory, garbage collection, the default is not to do any settings JVM will work well, but for some well-configured server and specific applications must be carefully tuned to achieve the best performance. By setting we want to achieve some goals:

    • GC's time is small enough
    • The number of GCS is low enough
    • The period of the full GC has been long enough

The first two are currently inconsistent, to the GC time is small must be a smaller heap, to ensure that the number of GC is small enough to ensure a larger heap, we can only take its balance.

(1) for the JVM heap settings, you can generally limit its minimum and maximum value by-XMS-XMX, in order to prevent the garbage collector from shrinking the heap between the minimum and maximum, resulting in additional time, we usually set the maximum and minimum to the same value
(2) The younger generation and the old generation will allocate heap memory according to the default scale (1:2) , either by adjusting the ratio between the two newradio to adjust the size of the two, or for recycling generations, such as the younger generation, through-xx:newsize-xx: Maxnewsize to set its absolute size. Similarly, in order to prevent the shrinking of a young generation, we usually set the-xx:newsize-xx:maxnewsize to the same size

(3) How much is it reasonable for young and old generations to set up? There is no doubt that I have no answer to this question, otherwise there will be no tuning. Let's take a look at the effects of the size change.

    • The larger young generation will inevitably lead to smaller older generations, and the larger young generation would prolong the cycle of the ordinary GC, but increase the time of each GC; The small old generation leads to more frequent full GC
    • Smaller young generations will inevitably lead to older generations, with small young generations leading to frequent GC, but shorter GC times per time, and older generations reducing the frequency of full GC
    • How to choose a distribution that should depend on the life cycle of an Application object : If the application has a large number of temporary objects, you should choose a larger young generation, and if there are relatively many persistent objects, the older generation should be enlarged appropriately. However, many applications do not have such obvious characteristics, in the choice should be based on the following two points: (A) in full GC as little as possible, so that the old generation to cache common objects, the JVM's default ratio of 1:2 is also this reason (B) by observing the application for a period of time, see the other peak when the old generation of memory, Under the premise of not affecting the full GC, according to the actual situation to increase the younger generation, such as can be controlled at 1:1 percentage. But the old generation should have at least 1/3 room for growth.

(4) on a well -configured machine (such as multicore, large memory), you can choose the parallel collection algorithm for the old generation : -xx:+useparalleloldgc , the default is serial collection

(5) Thread stack settings: Each thread will open the 1M stack by default, to hold the stack frame, call parameters, local variables, etc., for most applications this default value is too, general 256K is sufficient. Theoretically, in the case of constant memory, reducing the stack per thread can result in more threads, but this is actually restricted to the operating system.

(4) You can use the following parameters to play the heap dump information

    • -xx:heapdumppath
    • -xx:+printgcdetails
    • -xx:+printgctimestamps
    • -xloggc:/usr/aaa/dump/heap_trace.txt

The following parameters allow you to control the information of the print heap at OutOfMemoryError

    • -xx:+heapdumponoutofmemoryerror

Take a look at the Java parameter configuration for a time: (server: Linux 64bit,8corex16g)

java_opts= "$JAVA _OPTS-SERVER-XMS3G-XMX3G-XSS256K-XX:PERMSIZE=128M-XX:MAXPERMSIZE=128M-XX:+USEPARALLELOLDGC -xx:+heapdumponoutofmemoryerror-xx:heapdumppath=/usr/aaa/dump-xx:+printgcdetails-xx:+printgctimestamps-xloggc :/usr/aaa/dump/heap_trace.txt-xx:newsize=1g-xx:maxnewsize=1g "

After observing the configuration is very stable, each time the normal GC is around 10ms, the full GC basically does not occur, or a long time to occur only once

By analyzing the dump file, you can find that a full GC occurs every 1 hours, and as long as the JMX service is turned on in the JVM, JMX will execute a full GC for 1 hours to clear the reference, and refer to the attachment documentation for this.

4. Program algorithm tuning: This time not as the focus

Resources:

Http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

Source: http://blog.csdn.net/chen77716/article/details/5695893

=======================================================================================

Tuning method

It's all for this one step, tuning, before tuning, we need to remember the following principles:

1. Most Java applications do not require GC optimization on the server;

2, most of the Java applications that lead to GC problems, are not because of our parameter setting error, but the code problem;

3, before the application on-line, first consider the machine's JVM parameters set to the optimal (most suitable);

4, reduce the number of objects created;

5, reduce the use of global variables and large objects;

6, GC optimization is the last resort to use the means;

7, in the actual use, the analysis of GC optimization code is much more than the optimization of GC parameters;

GC optimization is intended for two (http://www.360doc.com/content/13/0305/10/15643_269388816.shtml):

1. Reduce the number of objects transferred to the old age to a minimum;

2. Reduce the execution time of full GC;

In order to achieve the above purpose, generally, the things you need to do are:

1, reduce the use of global variables and large objects;

2, adjust the size of the new generation to the most appropriate;

3, set the size of the old age is the most suitable;

4, select the appropriate GC collector;

In the above 4 methods, with a few "fit", what exactly is appropriate, general, please refer to the above "collector collocation" and "Start memory allocation" in the two sections of the recommendations. However, these recommendations are not omnipotent and need to evolve and evolve according to your machine and application, and in practice, you can set up two machines to separate GC parameters and compare them, choosing parameters that actually improve performance or reduce GC time.

A truly skilled use of GC tuning is based on the actual experience of GC monitoring and tuning several times, and the general steps for monitoring and tuning are:

1. Monitor the status of GC

Use various JVM tools to view the current log, analyze current JVM parameter settings, and analyze current heap memory snapshots and GC logs, depending on the actual area memory partition and GC execution time, whether it is optimized;

2, analyze the results, determine whether the need for optimization

If the parameters set reasonable, the system does not have time-out log, GC frequency is not high, GC time is not high, then there is no need for GC optimization, if the GC time is more than 1-3 seconds, or frequent GC, it must be optimized;

NOTE: GC is generally not required if the following metrics are met:

Minor GC execution time is less than 50ms;

Minor GC is performed infrequently, about 10 seconds at a time;

Full GC execution time less than 1s;

Full GC execution frequency is not frequent, not less than 10 minutes 1 times;

3. Adjust GC type and memory allocation

If the memory allocation is too large or too small, or the GC collector is slow, you should first adjust these parameters, and first find 1 or several machines for beta, and then compare the optimized machine and not optimize the performance of the machine comparison, and targeted to make the final choice;

4, continuous analysis and adjustment

Analyze and find the most suitable parameters through constant testing and trial-and-error

5, full application parameters

If the most appropriate parameter is found, apply these parameters to all servers and follow up.

Tuning instances

The above content is on paper, below we have some real examples to illustrate:

Example 1:

The author yesterday found some development test machine abnormal: Java.lang.OutOfMemoryError:GC overhead limit exceeded, this exception represents:

GC in order to free up a small space but it takes too much time, the reason is generally two: 1, the heap is too small, 2, there are dead loops or large objects;

The author first ruled out the 2nd reason, because the application is also running on-line, if there is a problem, has long been hung. So the suspicion is that the heap setting in this machine is too small;

Using Ps-ef |grep "Java" view, found:

The application has a heap setting of only 768m, while machine memory is 2g, and only one Java application is running on the machine, and there is no other place to occupy memory. In addition, this application is relatively large, need to occupy more memory;

I judge by the above situation, only need to change the size of each area of the heap settings can be changed to the following situation:

Trace the operation of the discovery, the related anomalies did not appear again;

Example 2:(http://www.360doc.com/content/13/0305/10/15643_269388816.shtml)

A service system that often appears to lag, analyzing the cause and discovering that full GC time is too long :

Jstat-gcutil:

S0 S1 E O P ygc ygct FGC fgct GCT

12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993

Analyzing the above data, found that young GC performed 54 times, took 2.047 seconds, each time the young GC time spent 37ms, in the normal range, while the full GC performed 5 times, 6.946 seconds, average 1.389s each time, the data shows that the problem is: full GC takes a long time , the analysis of the system refers to the discovery, newratio=9, that is, the Cenozoic and Laosheng generation size ratio of 1:9, which is the cause of the problem:

1, the Cenozoic is too small, causing the object to advance into the old age, triggering the old era of full GC;

2, the old age is larger, the full GC is time-consuming;

The optimization method is to adjust the value of the Newratio, adjusted to 4, found that the full GC no longer occurs, only the young GC is executing. This is the control of the object in the new generation of clean-up, not into the old age (this practice is useful for some applications, but not for all applications to do so)

Example 3:

One application during the performance test, it was found that the memory occupancy rate was high, the full GC was frequent, and the Sudo-u admin-h jmap-dump:format=b,file= file name was used. hprof pid to dump memory, generate dump file, And using the mat gap under Eclipse for analysis, found:


As you can see, there is a problem with this thread, and the large number of objects referenced by queue Linkedblockingqueue are not released, causing the entire thread to consume up to 378m of memory, notifying the developer to optimize the code and release the related objects.

Source: Java Series notes (4)-JVM monitoring and tuning

JVM Performance Tuning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.