This article transferred from: http://www.cnblogs.com/chen77716/archive/2010/06/26/2130807.html
Recent memory leaks due to the project, so large-scale JVM performance tuning, now the experience to do a record.
First, JVM memory model and garbage collection algorithm
1. According to the Java Virtual Machine specification, the JVM divides the memory into:
- New (Young Generation)
- Tenured (old generation)
- Permanent generation (Perm)
Where new and tenured belong to heap memory, heap memory is allocated from the memory specified by the JVM startup parameters (-XMX:3G), Perm is not part of the heap memory and has a virtual machine assigned directly, but can be passed -xx:permsize-xx:maxpermsize Adjust the size of the parameters.
- Young Generation (NEW): Young generation used to store Java objects just allocated by the JVM
- Older generation (tenured): objects that have not been reclaimed by garbage collection in the young generation will be copied to the old generation
- Permanent generation (Perm): Permanent storage class, method meta-information, the size of the project scale, class, method of the amount of relevant, generally set to 128M is sufficient, set the principle is to reserve 30% of space.
New is divided into several parts:
- Eden:eden is used to store objects that the JVM has just allocated
- Survivor1
- Survivro2: As large as two survivor spaces, when objects in Eden are not reclaimed by garbage collection, they are copied back and forth between the two survivor, and when a certain condition, such as copy number, is copy to tenured. Obviously, survivor only increases the duration of the object's stay in the younger generation, increasing the likelihood of garbage collection.
2. Garbage collection algorithm
Garbage collection algorithms can be divided into three categories, all based on the tag-purge (copy) algorithm:
- Serial algorithm (single thread)
- Parallel algorithms
- concurrency algorithm
Depending on the hardware configuration of the machine, the JVM chooses the appropriate recovery algorithm for each memory generation, for example, if the machine has more than 1 cores, the parallel algorithm is selected for the younger generation, and the selection details refer to the JVM tuning documentation.
Slightly explained, the parallel algorithm is a multi-threaded garbage collection, the recovery period will suspend the execution of the program, while the concurrency algorithm, but also multi-threaded recycling, but does not stop the application execution. Therefore, the concurrency algorithm is suitable for some programs with high interactivity. After observation, concurrency algorithms reduce the size of younger generations by using a large old generation, which in turn is relatively low throughput compared to parallel algorithms.
Another question is, when does a garbage collection action execute?
- When the young generation of memory is full, a normal GC is raised, and the GC recycles only the younger generation. When you need to emphasize, the younger generation is full of Eden, Survivor is full and does not cause GC
- The full gc,full GC will be collected at the same time as the old generation.
- Full GC will also be thrown when the permanent fill will cause the unload of class and method meta information
Another problem is when OutOfMemoryException is thrown, not when the memory is empty
- jvm98% time is spent in memory recycling
- Less than 2% of memory per recycle
Satisfying these two conditions will trigger OutOfMemoryException, which will leave a tiny gap in the system to do something down, such as manually printing heap Dump.
Second, memory leak and solution method
1. Some phenomena before the system crash:
- Each time the garbage collection is getting longer, the previous 10ms has been extended to about 50ms, and the FULLGC time has been extended to 4 0.5s, 5s
- FULLGC more and more times, the most frequent lapse of less than 1 minutes to do a FULLGC
- Older generations of memory are getting bigger and older generations have no memory released after each FULLGC
The system will not be able to respond to new requests and gradually reach the outofmemoryerror threshold.
2. Piles of dump files
The current heap information is generated from the JMX Mbean, the size of a 3G (the entire heap size) of the Hprof file, and if not started JMX can be generated by the Java Jmap command to generate the file.
3. Analyze the dump file
The following is to consider how to open this 3G heap information file, it is clear that the general window System does not have such a large amount of memory, must rely on high-configuration Linux. Of course we can use X-window to import graphics from Linux into window. We consider opening the file with the following tools:
- Visual VMS
- IBM Heapanalyzer
- The Hprof tool that comes with the JDK
In order to ensure load speed when using these tools, it is recommended to set the maximum memory to 6G. Using these tools, it is not possible to visually observe a memory leak, but the Visual VM can observe the object size but not the call stack; Heapanalyzer can see the call stack, but cannot open a 3G file correctly. As a result, we chose Eclipse's dedicated Static memory analysis tool: Mat.
4. Analyzing memory leaks
With the mat we can clearly see which objects are suspected to be memory leaks, which objects occupy the most space and the object's invocation relationship. In view of this case, there are many jbpmcontext instances in threadlocal, and the JBPM context is not closed due to investigation.
Also, we can analyze the thread state through Mat or JMX, and we can see which object the thread is blocked on, thus judging the bottleneck of the system.
5. Regression issues
Q: Why is the garbage collection time getting longer before the crash?
A: According to the memory model and garbage collection algorithm, garbage collection is divided into two parts: memory tag, clear (copy), Mark part as long as the memory size fixed time is constant, the change is the copy part, because each garbage collection has some recovery memory, so increase the amount of replication, resulting in longer time. Therefore, garbage collection time can also be used as a basis for judging memory leaks.
Q: Why are there more and more full GC times?
A: So the accumulation of memory, gradually exhausted the old generation of memory, resulting in new objects allocated no more space, resulting in frequent garbage collection
Q: Why are older generations taking up more memory?
A: Because the young generation of memory can not be recycled, more and more are copied to the old generation
Third, performance tuning
In addition to the above memory leaks, we also found that the CPU is less than 3% long-term, the system throughput is not enough, for 8corex16g, 64bit Linux server, is a serious waste of resources.
At a time when the CPU is under load, and occasionally the user is taking too long to reflect the request, we realize that the program and JVM must be tuned. This is done in several ways:
- Thread pool: Solving long-time user response issues
- Connection pool
- JVM Startup parameters: Adjust the memory scale and garbage collection algorithm for each generation to improve throughput
- Program algorithm: Improve program logic algorithm to improve performance
1.Java thread pool (java.util.concurrent.ThreadPoolExecutor)
The thread pool used by most applications on JVM6 is the thread pool that comes with the JDK, and the thread pool of mature Java threads is verbose because the thread pool behaves a bit like we imagined. The Java thread pool has several important configuration parameters:
- Corepoolsize: Number of core threads (number of latest threads)
- Maximumpoolsize: Maximum number of threads, more than this number of tasks will be rejected, users can customize the processing through the Rejectedexecutionhandler interface
- KeepAliveTime: The time the thread remains active
- WorkQueue: Work queue, storing performed tasks
The Java thread pool needs to pass in a queue parameter (WorkQueue) to hold the execution of the task, and the different choices for the queue, the thread pool has a completely different behavior:
SynchronousQueue:
一个无容量的等待队列,一个线程的insert操作必须等待另一线程的remove操作,采用这个Queue线程池将会为每个任务分配一个新线程
LinkedBlockingQueue :
无界队列,采用该Queue,线程池将忽略
maximumpoolsize parameter, all tasks are handled only with corepoolsize threads, and the unhandled tasks areLinkedBlockingQueue中排队
ArrayBlockingQueue: 有界队列,在有界队列和
Maximumpoolsize, the program will be difficult to tune: larger queue and small maximumpoolsize will lead to low CPU load, small queue and large pool, and the queue will not function properly.
In fact, our requirements are very simple, we hope that the thread pool can be the same as the connection pool, can set the minimum number of threads, the maximum number of threads, when the minimum number < Task < Maximum number, should be allocated new threading; When the Task > maximum number, you should wait for the idle thread to process the task.
However, the thread pool is designed so that tasks should be placed in the queue, and when the queue does not fit, consider using a new thread, and reject the task if the queue is full and the new thread cannot be derived. The design results in "First put, wait for execution", "no More Execution", "No Wait". Therefore, according to the different queue parameters, to increase the throughput can not blindly increase the maximumpoolsize.
Of course, to achieve our goal, we must encapsulate the thread pool, fortunately, Threadpoolexecutor has enough custom interfaces to help us achieve our goals. The way we encapsulate it is:
- Use Synchronousqueue as a parameter to make maximumpoolsize work to prevent threads from being allocated indefinitely while improving system throughput by increasing maximumpoolsize
- Customizing a Rejectedexecutionhandler, when the number of threads exceeds maximumpoolsize, is handled at intervals to check whether the thread pool can execute a new task, and if the rejected task can be put back into the thread pool, The time to check depends on the size of the KeepAliveTime.
2. Connection pool (Org.apache.commons.dbcp.BasicDataSource)
When using Org.apache.commons.dbcp.BasicDataSource, because the default configuration was used before, when the traffic was large, it was observed by JMX that many tomcat threads were blocking the Apache used in Basicdatasource Objectpool's lock, the direct reason was because the maximum number of connections for the Basicdatasource connection pool was set too small, the default Basicdatasource configuration, using only 8 maximum connections.
I also observed a problem when the system is not accessed for a longer period of time, such as 2 days, MySQL on DB will be disconnected so that the connection to the cache in the connection pool is not available. To solve these problems, we have fully studied the Basicdatasource and found some optimized points:
- MySQL supports 100 links by default, so the configuration of each connection pool is based on the number of machines in the cluster, such as 2 servers, each set to 60
- InitialSize: Parameter is the number of connections that have been open
- Minevictableidletimemillis: This parameter sets the idle time for each connection, over which time the connection will be closed
- Timebetweenevictionrunsmillis: The running period of a background thread used to detect expired connections
- Maxactive: Maximum number of connections that can be allocated
- Maxidle: Maximum idle number, when the connection is completed and the number of connections is greater than maxidle, the connection will be closed directly. Only InitialSize < x < Maxidle connections will be periodically checked for expiration. This parameter is primarily used to increase throughput during peak access.
- How is the InitialSize maintained? After studying the code, Basicdatasource will close all the extended connections and then open the InitialSize number of connections, this feature with Minevictableidletimemillis, Timebetweenevictionrunsmillis together ensures that all extended initialsize connections are reconnected, thus avoiding the problem that MySQL will be disconnected for a long time without action.
3.JVM parameters
In the JVM startup parameters, you can set some parameters related to memory, garbage collection, the default is not to do any settings JVM will work well, but for some well-configured server and specific applications must be carefully tuned to achieve the best performance. By setting we want to achieve some goals:
- GC's time is small enough
- The number of GCS is low enough
- The period of the full GC has been long enough
The first two are currently inconsistent, to the GC time is small must be a smaller heap, to ensure that the number of GC is small enough to ensure a larger heap, we can only take its balance.
(1) For JVM heap settings Generally, you can limit its minimum and maximum value by-XMS-XMX, in order to prevent the garbage collector from shrinking the heap between the minimum and maximum, the extra time, we usually set the maximum and minimum to the same value
(2) The younger generation and the old generation will allocate heap memory according to the default scale (1:2), either by adjusting the ratio between the two newradio to adjust the size of the two, or for recycling generations, such as the younger generation, through-xx:newsize-xx: Maxnewsize to set its absolute size. Similarly, in order to prevent the shrinking of a young generation, we usually set the-xx:newsize-xx:maxnewsize to the same size
(3) How much is it reasonable for young and old generations to set up? There is no doubt that I have no answer to this question, otherwise there will be no tuning. Let's take a look at the effects of the size change.
- The larger young generation will inevitably lead to smaller older generations, and the larger young generation would prolong the cycle of the ordinary GC, but increase the time of each GC; The small old generation leads to more frequent full GC
- Smaller young generations will inevitably lead to older generations, with small young generations leading to frequent GC, but shorter GC times per time, and older generations reducing the frequency of full GC
- How to choose a distribution that should depend on the life cycle of an Application object: If the application has a large number of temporary objects, you should choose a larger young generation, and if there are relatively many persistent objects, the older generation should be enlarged appropriately. However, many applications do not have such obvious characteristics, in the choice should be based on the following two points: (A) in full GC as little as possible, so that the old generation to cache common objects, the JVM's default ratio of 1:2 is also this reason (B) by observing the application for a period of time, see the other peak when the old generation of memory, Under the premise of not affecting the full GC, according to the actual situation to increase the younger generation, such as can be controlled at 1:1 percentage. But the old generation should have at least 1/3 room for growth.
(4) on a well-configured machine (such as multicore, large memory), you can choose the parallel collection algorithm for the old generation: -XX:+USEPARALLELOLDGC , the default is serial collection
(5) Thread stack settings: Each thread will open the 1M stack by default, to hold the stack frame, call parameters, local variables, etc., for most applications this default value is too, general 256K is sufficient. Theoretically, in the case of constant memory, reducing the stack per thread can result in more threads, but this is actually restricted to the operating system.
(4) You can use the following parameters to play the heap dump information
- -xx:heapdumppath
- -xx:+printgcdetails
- -xx:+printgctimestamps
- -xloggc:/usr/aaa/dump/heap_trace.txt
The following parameters allow you to control the information of the print heap at OutOfMemoryError
- -xx:+heapdumponoutofmemoryerror
Take a look at the Java parameter configuration for a time: (server: Linux 64bit,8corex16g)
java_opts= "$JAVA _OPTS-SERVER-XMS3G-XMX3G-XSS256K-XX:PERMSIZE=128M-XX:MAXPERMSIZE=128M-XX:+USEPARALLELOLDGC -xx:+heapdumponoutofmemoryerror-xx:heapdumppath=/usr/aaa/dump-xx:+printgcdetails-xx:+printgctimestamps-xloggc :/usr/aaa/dump/heap_trace.txt-xx:newsize=1g-xx:maxnewsize=1g "
After observing the configuration is very stable, each time the normal GC is around 10ms, the full GC basically does not occur, or a long time to occur only once
By analyzing the dump file, you can find that a full GC occurs every 1 hours, and as long as the JMX service is turned on in the JVM, JMX will execute a full GC for 1 hours to clear the reference, and refer to the attachment documentation for this.
4. Program algorithm tuning: This time not as the focus
JVM Performance Tuning (RPM)