Become a Java GC expert (5)-Java Performance Tuning principle, gc Tuning
Not every program needs optimization. If the performance of a program is the same as expected, you don't have to put extra effort into improving its performance. However, after the program debugging is completed, it is difficult to immediately meet its performance requirements, so we have to tune the job. Regardless of the programming language, tuning applications requires a wealth of technical knowledge and high concentration. In addition, you should not tune two programs in the same way, because each program has its own unique operation mode and different resource usage methods. For this reason, tuning requires more basic knowledge than writing programs. For example, you need to be familiar with virtual machines, operating systems, and computer architectures. When you are writing a program based on this knowledge, you can successfully tune it.
To optimize a Java program, you only need to modify JVM parameters, such as GC parameters. But sometimes you need to modify the program code. Regardless of the method, you must first monitor the process that executes the Java program. Therefore, this article will explain the following questions:
- How to monitor Java programs?
- What parameters should be set for JVM?
- How do I determine whether to modify the code?
Necessary knowledge for Java program optimization
The Java program runs in the Java Virtual Machine. Therefore, for optimization, you need to understand the JVM workflow. I have a blog post titled Understanding JVM Internals, which will give you a deep Understanding of JVM.
This article focuses on GC and Hotspot. Although only these two aspects of knowledge may not be able to tune all Java programs, these two factors affect the performance of Java programs in most cases.
It is worth noting that from the operating system perspective, JVM is also an application process. To create a good runtime environment for the JVM, you also need to understand the process of allocating resources to the operating system. This means that you should understand how the operating system or hardware works in addition to JVM to optimize Java programs.
The required knowledge is the Java language itself. In addition, it is important to understand lock and concurrency, class loading, and object creation.
When you start to tune the Java program, you should integrate the above knowledge to complete the work.
Java procedural optimization process
Figure 1 shows a Java procedural optimization flowchart, from the example by Charlie Hunt and Binu John.Java Performance.
Figure 1: Java procedural optimization process
JVM Distributed Model
JVM Distributed ModelIt is used to determine whether a Java program is executed on one JVM or multiple JVMs. You can choose based on its effectiveness, response capability, and maintainability. When running JVM on multiple servers, you can also choose to run multiple JVMs on one server or each server to run one JVM. For example, for each server, you can run one JVM with 8 GB heap memory or four JVMs with 2 GB memory. You should determine the number based on the number of processor kernels and program features. When response capabilities are prioritized, 2 GB of heap memory is better than 8 GB, because Full GC can be completed in a shorter time. Of course, 8 GB heap memory can reduce the frequency of Full GC. If your program uses the internal cache, you can increase the cache hit rate to improve the response capability. To sum up, to select a suitable model, you need to consider the characteristics of the application, and then select a model that can foster strengths and circumvent weaknesses.
JVM Architecture
The choice of JVM is actually the decision to use a 32-bit or 64-bit JVM. In the same condition, you 'd better use 32-bit. Because 32-bit JVM has better performance than 64-bit. However, the maximum heap memory supported by the 32-bit JVM is 4 GB (the actual allocable size is only 2-3 GB in 32-bit OS or 64-bit OS ). If a larger heap memory is required, it is more appropriate to use a 64-bit JVM.
Table 1: Performance Comparison (Data Source)
Test Benchmark |
Time (SEC) |
Coefficient |
C ++ Opt |
23 |
1.0x |
C ++ Dbg |
197 |
8.6x |
Java 64-bit |
134 |
5.8x |
Java 32-bit |
290 |
12.6x |
Java 32-bit GC * |
106 |
4.6x |
Java 32-bit spec gc * |
89 |
3.7x |
Scala |
82 |
3.6x |
Scala low-level * |
67 |
2.9x |
Scala low-level GC * |
58 |
2.5x |
Go 6g |
161 |
7.0x |
Go Pro * |
126 |
5.5x |
The next step is to run the program to test its performance. This process includes GC optimization, operating system configuration change, and code modification. You can use system monitoring tools or performance analysis tools for these jobs.
Note:Different methods can be used to optimize response capabilities and throughput. If the stop-the-word (Serial GC temporarily interrupts program execution) occurs frequently, the response capability of the program will be reduced. For example, Full GC is executed at high throughput. Don't forget that there are often gains and losses during optimization. This compromise not only happens between response capability and throughput. For example, you can use more CPU resources to reduce memory usage, or you have to endure a decline in response capability and throughput. On the contrary, the actual optimization should be performed based on the priority of each indicator.
AboveFigure 1The process in shows the performance tuning process that can be used for almost all Java programs, including Swing applications. However, this method is not suitable for our company's NHN server programs used to provide network services. BelowFigure 2The process in is based onFigure 1It is simpler and more suitable for NHN.
Figure 2: optimization process of HNH Java program
Where,Select JVMIndicates to use a 32-bit JVM whenever possible, unless you need a 64-bit JVM to maintain a cache of several GB.
Now, followFigure 2In the process, you will understand the specific work of each step.
JVM Parameters
I will mainly explain how to set appropriate JVM parameters for the Web server program. Although not necessarily suitable for all casesBest GC AlgorithmIs Concurrent Mark Sweep (CMS garbage collection), especially for Web server programs. BecauseLow latencyIs very important. Of course, when using CMS, due to the allocation of the New generation space (New Area), a long period of stop-the-world may occur, however, adjusting the size of the new generation space or its proportion to the whole heap space may solve this problem.
Specifying the size of the new generation space is equally important as specifying the size of the entire heap memory. You 'd better use–XX:NewRatio
To specify the ratio of the new generation to the entire heap, or directly use–XX:NewSize
To specify the new generation space. This configuration is necessary because most objects will not survive for a long time. In a Web program, except for caching data, most other objects onlyHttpRequest
ToHttpResponse
. This time is almost no more than 1 second, indicating that these objects will not survive more than 1 second. If the new generation space is not large enough, the object will be transferred to the old age space to free up space for new objects. The cost of garbage collection in the Old Area is much larger than that in the new generation. Therefore, it is necessary to set up a sufficient new generation space.
However, when the size of the new generation space exceeds a specific level, the response capability of the program will be reduced. Because the garbage collection process of the new generation Space is basically To copy data From one region vor Area To another (From Space and To Space ). In addition, the stop-the-world phenomenon occurs when garbage collection is performed in the new generation space and the old generation space. If the new generation space is larger, the physical vor Area space will be larger, so more data will be copied each time. Based on this feature, we should specifyNewRatio
Parameter to allocate the appropriate size of the new generation space.
Table 2: under different operating systems and configurationsNewRatio
Default Value
Operating System and Parameters |
Default Value: XX: NewRatio |
Iscsi-server |
2 |
Iscsi-client |
8 |
X86-server |
8 |
X86-client |
12 |
IfNewRatio
, Then the entire heap Space1/(NewRatio +1)
The size of the new generation space. As shown in the table aboveIscsi-serverThe default value of NewRatio is very small, becauseX86The operating system of the instance. Previously, the instance is used for high-end applications. This value is set for them. However, the performance of the x86 operating system has been greatly improved, and it is very common to use them as servers. Therefore, it is better to specify the NewRatio to 2 or 3.Iscsi-serverThe configuration on is the same.
In addition, you can specifyNewSize
AndMaxNewSize
To replace NewRatio. The size of the new generation space is the specified NewSize, which can be increased to the value of MaxNewSize. Eden (the region where the newly created object is stored) and Region vor Area will increase with the proportion. It's just like-Xms (Note:The original text is-Xs, which should be a written mistake.) It is also a good choice to set MaxSize and MaxNewSize to the same value as-Xmx.
If both NewRatio and NewSize are specified, you should use a larger one. Therefore, when the heap space is created, you can use the following expression to calculate the size of the initial New Generation Space:
1 |
min(MaxNewSize, max(NewSize, heap/(NewRatio+ 1 ))) |
In any case, it is impossible to find the appropriate heap space and the size of the new generation space by just one attempt. Based on my experience in running Web servers in NHN, we recommend that you use the following JVM parameters to run Java programs. After monitoring the program performance under these parameters, you can select a more appropriate GC algorithm or configuration.
Table 3: Recommended JVM Parameters
Type |
Parameters |
Running Mode |
-Sever |
Total heap memory size |
Set the same value for-Xms and-Xmx. |
New Generation Space size |
-XX: NewRatio: 2 to 4.-XX: NewSize =? -XX: MaxNewSize = ?. NewSize can also be used to replace NewRatio. |
Permanent generation space size |
-XX: PermSize = 256 m-XX: MaxPermSize = 256 m. Set a value that does not cause any problems during running. This parameter does not affect performance. |
GC log |
-Xloggc: $ CATALINA_BASE/logs/gc. log-XX: + PrintGCDetails-XX: + PrintGCDateStamps. Recording GC logs does not particularly affect Java program performance. We recommend that you record logs as much as possible. |
GC Algorithm |
-XX: + UseParNewGC-XX: + CMSParallelRemarkEnabled-XX: + UseConcMarkSweepGC-XX: CMSInitiatingOccupancyFraction = 75. these configurations are generally recommended, but other configurations may be better based on different features of the program. |
Create heap memory dump when OOM occurs |
-XX: + HeapDumpOnOutOfMemoryError-XX: HeapDumpPath = $ CATALINA_BASE/logs |
Operations after OOM |
-XX: OnOutOfMemoryError = $ CATALINA_HOME/bin/stop. sh or-XX: OnOutOfMemoryError = $ CATALINA_HOME/bin/restart. sh. after recording the memory dump file, perform an appropriate operation to manage the file. |
Measure Program Performance
The following information is required to obtain the program performance:
- System throughput (TPS, OPS): Understand the performance of a program in terms of concept.
- Request Per Second-RPS): Strictly speaking, RPS is different from the pure response capability, but you can regard it as the response capability. With this indicator, you can see how long it takes for the user to get the request results.
- Standard deviation of RPS: If possible, it is necessary to include the RPS of the event. If any deviation occurs, check the GC or network system.
To get more accurate performance, you should wait until the program is fully started, because the bytecode will then be compiled as a local machine code by HotSpot JIT. In general, it takes at least 10 minutes to test the program with nGrinder and other tools after the specified function is loaded.
Practical optimization
If the nGrinder test results meet expectations, you do not need to tune the program performance. If the expected results are not met, you should perform optimization to solve the problem. The following describes how to use an instance.
Stop-the-world takes too long
The long duration of stop-the-world may be caused by unreasonable GC parameters or incorrect code implementation. You can use an analysis tool or Heap dump to locate the problem, such as checking the type and quantity of objects in the Heap memory. If you find a lot of unnecessary objects, you 'd better improve the code. If you do not find any special problems during object creation, you 'd better simply modify the GC parameters.
To adjust the GC parameters appropriately, You need to obtain a GC log that is long enough, and you must know what causes a long stop-the-world. Want to know