1.GC related
The hotspot virtual machine divides it physically into two – the new generation (younggeneration) and the older generation (oldgeneration).
New generation (young Generation): The vast majority of newly created objects are assigned here, and since most objects become inaccessible soon after they are created, many objects are created in the Cenozoic and then disappear. The process by which objects disappear from this area is what we call "minor GC".
old generation: objects have not become unreachable and survived from the Cenozoic and are copied here. It occupies more space than the Cenozoic. Because of its relatively large space, the GC that occurs in the old age is much less than that of the Cenozoic. The process of disappearing an object from the old age, which we call "major GC" (or "fullgc")
The persistent generation ( permanent generation ) in the diagram is also known as the method Area . He is used to save class constants as well as string constants. Therefore, this area is not used to permanently store objects that survived from the old age. GC may also occur in this area. and GC events that occur in this area are also counted as major GC.
The new generation is used to preserve the objects that were created for the first time, and he can be divided into three spaces
- An Eden space (Eden )
- Two survivor space (Survivor )
- Most of the objects that have just been created will be stored in the Eden space.
- After the first GC was performed in Eden Space, the surviving objects were moved to one of the survivor spaces.
- Thereafter, after the GC is executed in Eden Space, the surviving objects are stacked in the same survivor space.
- When one survivor is saturated with space, the surviving object is moved to another survivor's space. This will then empty the survivor space that is already saturated.
- In the above steps, repeated several times the surviving objects will be moved to the old age.
Jstat is a monitoring tool provided by the Hotspot JVM
jstat –gc $<pid$> 1000
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
3008.0 3072.0 0.0 1511.1 343360.0 46383.0 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588
3008.0 3072.0 0.0 1511.1 343360.0 47530.9 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588
3008.0 3072.0 0.0 1511.1 343360.0 47793.0 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588
This information is important because they show how much time is spent in GC processing.
In this example, YGC is 217 and YGCT is 0.928, so after a simple calculation of the average number of data, you can see that each new generation of GC takes approximately 4ms (0.004 seconds), while the average time of the full GC is 33MS.
However, it is often impossible to analyze real GC problems by looking at only the average data. This is mainly due to the severe deviation of the GC operation time (in other words, if the time of two full GC is 67ms, then one of the full GC may have performed 10ms and the other may have performed 57ms. In order to better detect each GC processing time, it is best to use –VERBOSEGC to replace the average data.
Why optimization is needed GC
Or more precisely, is it necessary to optimize the GC for Java -based services ? It should be said that for all Java-based services, GC optimization is not always required, but only if the Java-based system that is running contains the following parameters or behavior:
- Memory size has been set through-XMS and –XMX
- Contains the-server parameter
- There are no error logs such as timeout logs in the system
in other words, if you do not set the size of the memory, and the system is flooded with a large number of timeout logs, you need to do in your system GC optimized.
However, you need to always remember one :GC optimization is always the last task.
I've summed up two goals for GC optimization:
- One is to minimize the number of objects transferred to the old age.
- The other is to reduce the execution time of the full GC
To minimize the number of objects transferred to the old age
The surrogate GC mechanism is provided by the Oracle JVM and does not include the G1 GC that can be used in JDK7 and later versions. In other words, objects are created in the Eden space and then converted to survivor space, and eventually the remaining objects are sent to the old age. Some of the larger objects will be transferred directly to the old age space after being created in the Eden space. GC processing in the old age space will take more time for the new generation. Therefore, reducing the data that is moved to the old age object can significantly reduce the frequency of the full GC. Reducing the number of objects that are moved to the old age space can be misinterpreted as leaving objects in the new generation. However, this is not possible. Instead, you can adjust the size of the Cenozoic space.
reduced Full GC Execution Time
Full GC has a much longer execution time than minor GC. Therefore, if the full GC spends too much time (more than 1 seconds), some connected parts may have a time-out error.
- If you try to reduce the execution time of full GC by reducing the age of the old space, the number of OutOfMemoryError or full GC executions may increase.
- Conversely, if you try to reduce the number of full GC executions by increasing older generation space, the execution time increases.
Therefore, you need to set the old generation space to an "appropriate" value.
Influence GC Parameters for Performance
As we mentioned at the end of the second article, do not fantasize about "the performance of someone who has set the GC parameter is greatly improved, why don't we use the same parameters?" because different Web services create objects that vary in size and their life cycle.
To put it simply, if a task's execution condition is a,b,c,d and E, the same task execution conditions are changed to a and B, which one will you think is faster? From the general human intuition, the tasks performed under A and B are faster.
Java GC parameters are the same, setting some parameters not only does not increase the speed of GC execution, but may cause him to be slower. GC The most basic principle of optimization is to use different GC parameters for 2 or more servers and compare them, and apply those parameters that have proven to improve performance or reduce GC execution time to the server. Keep this in mind.
The following table lists the parameters in the GC parameter that are related to memory size and can affect performance.
Table 1 : GC optimization needs to be considered. Java Parameters
Defined |
Parameters |
Describe |
Heap Memory space |
-xms |
Heap area size when starting JVM The heap memory space when the JVM is started. |
|
-xmx |
Maximum Heap Area Size Maximum heap memory limit |
Cenozoic Space |
-xx:newratio |
Ratio of New area and old area The proportion of the new generation and the old age |
|
-xx:newsize |
New Area Size Cenozoic Space |
|
-xx:survivorratio |
Ratio Ofedenarea and Survivor area The proportion of space in Eden and survivor space |
I often use-xms,-xmx and-xx:newratio when doing GC optimizations. -xms and-xmx are necessary. How you set Newratio can have a significant impact on GC performance. Some people may ask how to set the size of the Perm area? You can set it by-xx:permsize and-xx:maxpermsize parameters,
When a outofmemoryerror error occurs and is caused by insufficient perm space, another parameter that may affect GC performance is the GC type. The following table lists all the optional GC types (based on JDK6.0)
after analyzing the monitoring results, decide whether to GC Optimized
As you examine the GC status, you should analyze the monitoring results to determine if GC optimization is performed, and if the analysis shows that the GC is only 0.1-0.3 seconds away, then you don't need to waste time doing GC optimizations. However, if the GC's execution time is 1-3 seconds, or more than 10 seconds, the GC will be imperative.
However, if you have already allocated 10GB of memory for Java and can no longer reduce the memory size, you will no longer be able to optimize the GC. Before you do GC optimization, you have to figure out why you want to allocate so much memory space. If outofmemoryerror occurs when you divide 1 GB or 2 GB of memory, you should perform a heap memory dump and eliminate the risk.
Attention:
A heap memory dump is a file that is used to examine objects and data in Java memory. The file can be created by executing the JMAP command in the JDK. During the creation of the file, the Java program pauses, so do not create the file during system execution.
If GC execution time satisfies all of the following conditions, it means that GC optimization is not required.
- Minor GC execution is fast (less than 50ms)
- Minor GC does not perform frequently (about 10 seconds at a time)
- Full GC executes very quickly (less than 1s)
- Full GC does not perform frequently (10 minutes at a time)
The numbers mentioned above are not absolute; they differ depending on the state of the service, and some services may be satisfied with the full GC's speed of 0.9 seconds at a time, but others may not. Therefore, different values are set for different services to determine whether GC optimization is performed.
- Set the amount of memory space
The following table shows the relationship between the size of the memory space, the number of GC executions, and the GC execution time.
- Large Memory space
- Reduce the number of GC executions
- Increase GC Execution time
- Small Memory space
- Reduce GC Execution time
- Increase the number of GC executions
There is no single standard answer on how to set the size of the memory space. If the server resources are sufficient and the full GC can be completed in 1 seconds, it is possible to set to 10GB. However, most servers do not, and when the memory is set to 10GB, it may take 10-30 seconds to execute the full GC. Of course, the execution time changes depending on the size of the object.
In view of this, how should we set the size of the memory space? in general, I recommend 500MB. Note, however, that this is not to allow you to set the memory parameters of was to –xms500m and –xmx500m. Depending on the state before the GC is optimized, if the full GC executes after the memory space remaining 300MB, then it is best to set the memory to 1GB (300MB (default program occupancy) + 500MB (the old age minimum Space) +200MB (free memory)). That means you have to set up 500MB for the old age. Therefore, if you have three execution servers, the memory is set to 1GB,1.5GB,2GB, and the results are checked.
In theory, GC execution speed should follow 1gb> 1.5gb> 2GB, so 1GB performs the fastest GC. However, it does not mean that the full GC of 1GB space will take 1 seconds and 2GB space will take 2 seconds. Time depends on the performance of the server and the size of the object. Therefore, the best way is to build as many metrics as possible to monitor them.
For the size of the memory space, you should set additional newratio parameters. The Newratio parameter is the proportion of the new generation and the old age space, that is, the xx:newratio=1 means that the ratio of the new generation to the old age is 1:1. For the 1GB is the new generation and the old age of 500MB. If Newratio is 2, it means that the ratio of the generation of older generations is 1:2, so the higher the value, the larger the old age space, the smaller the Cenozoic space.
This may seem like a less important thing, but the Newratio parameter can significantly affect the performance of the GC as a whole. If the Cenozoic space is small, more objects will be transferred to the old age space, resulting in frequent full GC, increasing the pause time.
You can simply think that Newratio is the best choice for 1, but sometimes it may be set to 2 or 3 better, and I've seen a lot of such examples.
How to get the fastest finish GC optimization? comparing the results of a performance test should be the quickest way to set different parameters for each server and monitor their status, and it is highly recommended to monitor at least 1 or 2 days of data. However, when you optimize for GC, you want to make sure that the same load is performed every time. And the requested ratios, such as URLs, should be consistent. However, even for professional testers it is difficult to control the load accurately, and it takes a lot of time to prepare. Therefore, it is relatively convenient and easy to adjust the parameters, then take a longer time to collect the results.
Example 1
The following example is for service S optimizations, which took too long for the recently deployed service S,full GC.
Take a look at the results of Jstat–gcutil's execution.
12 |
S0 S1 E O P YGC YGCT FGC FGCT GCT 12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993 |
The leftmost Perm space is not important for the initial GC optimization, and this time the value of the YGC parameter is more useful.
The average value of the Minor GC and full GC is shown in the following table
table 3 : Service S Minor gc and full GC average execution time
GC type |
Number of GC executions |
GC Execution Time |
Average |
Minor GC |
54 |
2.047 |
Panax Notoginseng ms |
Full GC |
5 |
6.946 |
1,389 s |
The most important is the following two data
- New Generation actual usage space: 212,992 KB
- Old age actual usage space: 1,884,160 KB
Therefore, the total memory space is 2GB, does not calculate the perm space, the new generation and the old age ratio is 1:9. Data is collected through jstat and-VERBOSEGC logs, and three servers are set up in the following manner.
- newratio=2
- Newratio=3
- Newratio=4
After one day, after checking the system's GC log, it was found that, after setting the Newratio parameter, it was fortunate that no full GC had occurred.
Why?
- Newratio=2:45 ms
- Newratio=3:34 ms
- Newratio=4:30 ms
We see that the newratio=4 is the best parameter, although its Cenozoic space is the smallest, but the GC time is indeed the shortest. After setting this parameter, the system has not performed a full GC.
To illustrate this, here is the result of the service star performing jstat–gcutil after a period of time
12 |
S0 S1 E O P YGC YGCT FGC FGCT GCT 8.61 0.00 30.67 24.62 22.38 2424 30.219 0 0.000 30.219 |
You might think that the frequency of GC execution decreases because the server accepts fewer requests. In fact, although the full GC was not executed, the minor GC was executed 2,424 times.
Example 2
This is an example of ServiceA, where we discovered that the JVM was paused for a long time (more than 8 seconds) through the in-House Application Performance Management System (APM), so we did GC optimizations. We found out why the full GC was taking too long to execute, and we started to fix it.
The first step in GC optimization is that we add the-VERBOSEGC parameter and get the following result.
Figure 1 : To GC before the optimization STW Time
As shown, one of the automatically generated pictures by Hpjmeter. The x-coordinate represents the time the JVM executes. The y-coordinate represents the time of each GC. A CMS green dot that represents the full GC result. Parallel scavenge Blue Dot, indicating minor GC results.
I have said before that CMS GC is the fastest, but the above results show that for some reason, it took up to 15 seconds. what is causing the result? If you think of what I mentioned before, the CMS slows down when it comes to memory cleanup. At the same time, the memory of the service is set to –XMS1G and –xmx4g, and 4GB of memory is actually allocated.
Therefore, I changed the GC type from CMS to parallel GC. and change the memory to 2GB, set Newratio to 3. A few hours later I used Jstat–gcutil to get the following results
12 |
S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 30.48 3.31 26.54 37.01 226 11.131 4 11.758 22.890 |
The full GC turns to an average of 3 seconds per 15 seconds compared to 4GB. But 3 seconds is slower, so I designed the following 6 scenarios.
- Case 1:-xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=2
- Case 2:-xx:+useparallelgc-xms1536m-xmx1536m-xx:newratio=3
- Case 3:-xx:+useparallelgc-xms1g-xmx1g-xx:newratio=3
- Case 4:-xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=2
- Case 5:-xx:+useparalleloldgc-xms1536m-xmx1536m-xx:newratio=3
- Case 6:-xx:+useparalleloldgc-xms1g-xmx1g-xx:newratio=3
which one is the quickest? the results show that the smaller the memory, the better the result. Shows the results of the CASE6. This is the best performance for GC. The maximum response time is only 1.7 seconds. The average time is within 1 seconds.
Figure 2 : Case6 the time chart
Based on the results above. We adjusted the GC parameters according to CASE6. However, this has led to outofmemoryerror every night. It is difficult to explain the specific reasons here. In simple terms, a batch program causes a memory leak. The related issues have been resolved.
It is dangerous to optimize all servers if the GC logs are analyzed in a short time. Keep in mind that you must analyze GC logs and applications at the same time.
We reviewed two examples of GC optimizations, as I mentioned earlier, the GC parameters mentioned in the example can be set on the same server, but only if they have the same CPU, operating system, JDK version, and running the same service. But do not directly use the parameters I used to your service-oriented, they may not be able to work very well.
Real tuning
If the results of the test meet expectations, you do not need to tune the program for performance. If you do not achieve the expected results, you should perform tuning to solve the problem. The next step is to explain the method by example.
Stop-the-world time is too long
Stop-the-world may be too lengthy because the GC parameters are unreasonable or the code is not implemented correctly. You can locate problems by analyzing tools or heap dump files (heap dumps), such as checking the type and number of objects in heap memory. If you find a lot of unnecessary objects in it, it's best to improve the code. If there is a particular problem in the process of creating an object, it is best to simply modify the GC parameters.
In order to properly adjust GC parameters, you need to obtain a GC log that is long enough, and you must know what will cause a long stop-the-world. To learn more about choosing the right GC parameters, you can read a blog post from my colleague: how to Monitor Java garbage Collection.
Low CPU usage
When the system is blocked, throughput and CPU usage are reduced. This may be due to network system or concurrency problems. To solve this problem, you can analyze thread dump information or use analysis tools. Read this article to learn more about thread dump analysis: How to Analyze Java thread dumps.
You can use commercially available analysis tools to perform accurate analysis of the thread lock, but most of the time, you can get enough information by simply using the CPU Analyzer in JVISUALVM .
High CPU Usage
If throughput is low but CPU usage is high, it is likely that inefficient code is causing it. In this case, you should use the Profiling tools to locate bottlenecks in the code for performance. The available tools are:JVISUALVM,Eclipse TPTP , or JProbe.
Tuning method
It is recommended that you use the following methods to tune your program.
First, check that performance tuning is necessary. Measuring performance is not a simple job, and you cannot guarantee that you will get a satisfactory result every time. Therefore, if the program already meets the expected performance requirements, you do not need to add additional input to the tuning.
The problem is only in one place, and all you have to do is get rid of it. The 28 law (Pareto principle) also applies to performance tuning. This is not to say that the low performance of a module must come from a single problem, but rather that we should focus on the problem that has the greatest impact when tuning. After dealing with the most important, you should solve the rest. It is recommended to fix only one problem at a time.
Also need to take into account the balloon effect (Balloon effect), there will be lost. You can improve responsiveness by using caching, but when the cache grows, it takes longer to execute a full GC. In general, throughput and responsiveness can deteriorate if you want low memory usage. Therefore, you need to know what is most important to your program and what is secondary.
So far, you should have learned how to tune your Java program for performance tuning. I have to omit some of these details in order to introduce the specific process of performance measurement, but I think it's enough to handle most Java Web server programs.
2. Operation related 1. How to consider a performance requirement
1. Usually get a performance requirements, need to understand the interface model, the entire interface of the use of scale and capacity, system structure and technical solutions, database cache and index distribution, design a reasonable test plan, the principle of design is as realistic as possible to simulate the online situation.
2. According to the design of the test plan, the use of a thread to run the test plan, to obtain the test results, compare the test indicators
3. Increase the number of concurrent numbers in a linear growth way, and compare whether the test results are linear growth
4. Test results The flat area usually represents the largest area of the system capacity, analyzing the test results at this time, such as TPS and response time, etc.
5. Analyze database TPS, QPS and other data, Redis hit rate, MQ throughput, JAVAGC, program occupancy time and other factors to improve system performance
6. Analyze the resource usage of each module through Jstack.
7. After the optimization of the program to go through a large number of data testing to ensure that the test program does not error
8. According to the test summary report, database monitoring data and other data integration test report
9. Need to pay attention to the API layer with a long or short connection, long connection needs to be checked in jmeter keep alive
2. How to ensure jmeter self-performance
1. Start with the command line to reduce the performance issues caused by the interface
2. Command line record aggregation report, do not start too many other monitoring reports
3. Close unnecessary logs as much as possible, including the Log,jmeter self-log of the test code.
4. Make sure that the test scripts are timed accurately and that the steps need to be timed, those that are ready to be timed out
5. Excessive concurrent use of distributed requests, when the local cpu,mem occupy relatively large, modify the jmeter.properties, increase the remote remote-server, and start (see JMeter Operation manual for specific operation)
3. How to analyze test results
1. According to the aggregated report analysis 90%, 95%, and the largest indicators, to find out the general response time of the program.
2. Let the developer query the log when the maximum response time, analyze the program level of the execution of the task situation
3. Use Jstack to analyze the current program's resource waits, system resource situation
The thread statuses that are worth watching are:
Deadlock,Deadlock (focus on)
In execution, Runnable
Waiting for resources,waiting on condition (focus)
Waiting to get the monitor,waiting on monitor entry (focus)
Pause, Suspended
Object waits, object.wait () or timed_waiting
Blocking,Blocked (Focus)
Stop, parked.
Example diagram
4. Analyze GC results, Cpu,io, etc., and analyze whether the program has made full use of system resources
5. Compare and analyze the program response of each concurrent number.
JMeter Performance Test-GC related