First, introduce one person: Jon Masamitsu. This person's background is unknown, but he does JVM in Sun, so his blog I think everyone who wants to tune the JVM should read it. Many of the ideas and figures in this article are also taken from his blog.
Blog link: http://blogs.sun.com/jonthecollector/
In his blog [1], the three most important options for GC optimization are written:
The third place is the proportion of young generation in the entire JVM heap;
The second place is the size of the entire JVM heap;
The top priority is to select the appropriate GC collector, which is the content of this article.
Basic Concepts
First, get some basic knowledge of science. JVM heap is split into different generations in implementation (many Chinese are translated as 'dai'). For example, objects with short lifecycles are stored in young generation, objects with a long life cycle are placed in tenured generation, for example (from [2 ]).
When GC only occurs in young generation, the objects in young generation are recycled, called minor GC. When GC occurs in tenured generation, it is called Major GC or full GC. In general, the occurrence frequency of minor GC is much higher than that of major GC.
For more information about generation, see [2] or many online articles.
(From [3]) It is clear that several GC collector provided by JVM are listed.
Among them, there are three collector responsible for young generation:
Serial
: The simplest collector, only one thread is responsible for GC, And the whole program (so-called "stop-the-world") will be suspended when GC is executed, as shown in;
Parallel scavenge:
Compared with serial, it uses multi-thread to process GC. Of course, during execution, it still "stop-the-world". The advantage is that, the pause time may be shorter;
Parnew:
It is basically similar to parallel scavenge. The only difference is that it can be used together with CMS for enhancement;
There are also three collector responsible for tenured generation:
Serial old:
A single thread uses the Mark-sweep-compact reclaim method (well, I admit I don't know what Mark-sweep-compact is. For me, only remember that it is single-threaded), the illustration is similar to serial;
Parallel old:
Likewise, multi-threaded GC collector;
CMS:
The full name is "Concurrent-mark-sweep". It is the collector with the highest concurrency and the lowest pause time. It is called concurrent because it is used to execute a GC Task, GC thread works with Application Thread and basically does not need to be paused, as shown in;
Six collector types have been introduced. However, when setting JVM parameters, few people define the collector of young generation and tenured generation respectively. Instead, they provide several options:
-XX: + useserialgc:
Equivalent to "serial" + "serialold", this solution should be visually the worst performing, and my experiments prove that this is also true;
-XX: + useparallelgc:
It is equivalent to "parallel scavenge" + "serialold". That is to say, it is multi-threaded in young generation, but it is single thread in tenured generation;
-XX: + useparalleloldgc:
It is equivalent to "parallel scavenge" + "parallelold", and all are multi-thread parallel processing;
-XX: + useconcmarksweepgc:
It is equivalent to "parnew" + "CMS" + "Serial old", that is, parnew is used in young generation and multithreading; CMS is used in tenured generation to get the lowest pause time, however, the use of CMS may produce "concurrent mode failure" (This will be discussed later). If yes, you can only use the "serialold" mode;
[3] also mentioned a scheme: useparnewgc, but I will not introduce it if I seldom see someone else using it.
Labs and results
If you have said so much, please verify it through experiment.
The tested system can be regarded as a memory database. All data and queries are carried out in the memory. The workload used for testing includes writing data to the memory, it also contains a variety of queries. Such a system and test data will certainly generate many short-cycle objects at any time due to processing a large number of queries during the running process, so the minor GC corresponding to young generation will be frequent, at the same time, due to the continuous Writing of new data to the database, and the data belongs to a long-period object, the tenured generation usage continues to grow.
The experiment uses a Dell server with 48 gb memory and 2 CPUs, each with 4 cores. Therefore, a total of 8 cores are used. For the convenience of the experiment, I only used about 16 GB of memory for JVM.
Serialgc
First look at the first figure:
The sample points in the figure are a little dense, and a small part of the screenshot is enlarged as follows:
The vertical axis of the figure indicates the system throughput, in the unit of request/second. I set the calculation interval of throughput to 5 seconds. That is to say, every vertex in the figure is the average value of the system throughput within a 5-Second time interval.
The horizontal axis of an image indicates the time dimension.
The settings for the following images are the same as those for this image.
This figure usesSerialgc
The parameter settings for the entire JVM are as follows:
Java-jar-xms10g-xmx15g-XX: + useserialgc
-XX: newsize = 6g-XX: maxnewsize = 6g-verbose: GC-XX: + printgcdetails-XX: + printgctimestamps-XX: + printheapatgc-xloggc :. /log/GC. log slaver. jar
Where-XX: newsize = 6g-XX: maxnewsize = 6g
It indicates that the initial size and maximum size of young generation are set to 6 GB. That is to say, the size of young generation will not change throughout the system operation. This is because the JVM dynamically adjusts the ratio of young generation to tenured generation based on actual running conditions, and such a setting can avoid such dynamic adjustment, so as to facilitate observation of the experiment results.
-Xms10g-xmx15g
Sets the initial size of JVM heap to 10 Gb, and its maximum size to 15 GB. The tenured generation size is equal to (10-6) g at the beginning, and it can grow to (15-6) g (only approximate calculation, ignoring the size of perm generation ).
-Verbose: GC-XX: + printgcdetails-XX: + printgctimestamps-XX: + printheapatgc-xloggc:./log/GC. Log
These parameters are used to record GC details and heap changes to GC. log.
Slaver. Jar
Is my program ....
For subsequent experiments, these parameters will not change. The only change is to select different collector.
OK. Return to the experiment result diagram.
An interesting phenomenon in this figure is that the jitter is very large, almost always a point high, and it will be low next to a point. This means that the system throughput is constantly fluctuating. The reason for this phenomenon is that in the experiment, the occurrence frequency of minor GC is basically 7-9 seconds (observed through log ), as mentioned above, the time interval for calculating throughput is 5 seconds. Therefore, minor GC occurs every other time interval, however, the average throughput of the minor GC interval is reduced.
To prove this, I randomly found a log of minor GC:
900.692: [GC 900.692: [defnew: 5334222 K-> 300781 K (5662336 K), 0.8988610 secs] 7654628 K-> 2641401 K (9856640 K), 0.8990190 secs] [times:User= 0.86
Sys= 0.03,Real = 0.89 secs
]
Explain this log:
900.692
It indicates that the GC occurs at the time of 900.062 seconds after the system is started;
5334222 K-> 300781 K (5662336 K)
It indicates that young generation reduced from 5334222k to 300781 K in this GC, while the size of young generation is 5662336 K (note that this value is similar to the 6g we set earlier ), this process took 0.8988610 seconds;
7654628 K-> 2641401 K (9856640 K)
It indicates that the JVM heap is reduced from 7654628k to 2641401 K (the difference between the two values should be almost the same as the difference between the two values above, because this is minor GC, therefore, the entire heap cleanup space is actually the space that young generation clears );
User = 0.86 sys = 0.03, real = 0.89 secs
It indicates that the GC's user time is 0.86, and the real time is 0.89 seconds (for details about user/sys/real, see [4 ])
As we mentioned earlier, serialgc must suspend the entire application. Therefore, this GC suspends the entire application for 0.89 seconds, which causes the system throughput to decline;
The more interesting phenomenon in the figure is that, near the horizontal axis 1700, the system throughput goes down to almost 0. An intuitive guess is that full GC occurs here. Sure enough, it is found in log:
1700.750: [GC 1700.750: [defnew: 5335802 K-> 302541 K (5662336 K ),Secs 0.9367800
] 1701.687: [tenured: 4210828 K-> 4211008 K (4211008 K ),Secs 17.6799010
] 9526754 K-> 4513370 K (9873344 K), [perm: 11048 K-> 11048 K (21248 K)], 18.6220490 secs] [times:User= 18.61
Sys= 0.01,Real = 18.62 secs
].
[Tenured: 4210828 K-> 4211008 K (4211008 K), 17.6799010 secs]
It indicates that the tenured generation was GC and took 17.6799010 seconds (because the data I wrote was always saved, so this GC hardly cleared any dead object in tenured generation, so the decline is not much 4210828 K-> 4211008 K ).
The total GC cost is 18.62 secs, which means that the system has been suspended in the past 18 seconds. In these 18 seconds, the system has almost no throughput. Therefore, using Seral GC means that the system will not respond to external requests for about a dozen seconds in the case of full GC. If it is a web server, this means that users cannot browse the Web page for dozens of seconds. This is not good.
In addition, the log records heap before and after full GC:
{Heap before GC invocations = 204 (full 0 ):
...........................................
Tenured generationTotal 4194304 K
, Used 4190951 K [0x00007fe2c82e0000, 0x00007fe3c82e0000, 0x00007fe5082e0000)
The space 4194304 K,99% used
[0x00007fe2c82e0000, 0x00007fe3c7f99dd8, 0x00007fe3c7f99e00, 0x00007fe3c82e0000)
........................................
........................................
Heap after GC invocations = 205 (full 1 ):
........................................ ............................
Tenured generationTotal 7018348 K
, Used 4211008 K [0x00007fe2c82e0000, 0x00007fe4748bb000, 0x00007fe5082e0000)
The space 7018348 K,59% used
[0x00007fe2c82e0000, 0x00007fe3c9330000, 0x00007fe3c9330000, 0x00007fe4748bb000)
........................................ ..............
}
From the log, we can see that before GC occurs, the tenured generation usage in heap has reached 99%, meaning that tenured generation is full. Therefore, it can be determined that the condition for full GC is that tenured generation is full.
After GC, it was found that total space increased from 4194304 K to 7018348 K. Remember, when we set the JVM, the initial heap size is 10 Gb, and the maximum size is 15 GB, and the extra 5 GB is used for the growth of generation.
Finally, you can find that, whether it is minor GC or major GC, its user time and real time values are not much different. If you understand the meaning of user time and real time, this means that the GC Task is executed in a single thread. This is also consistent with the serialgc concept.
(To be continued)
Reference:
[1] The second most important GC tuning knob
[2] Java SE 6 hotspot [Tm] Virtual Machine garbage collection Tuning
[3] Our collectors
[4] user/sys/Real Time