Analysis of Memory leakage through GC output

Source: Internet
Author: User
Tags jprofiler

After SIP5.0, the Service's request volume increased explosively, and thus the problems that were not previously exposed were also exposed. Generally, a new version has been released for about a month in the past. Therefore, if a small memory leak occurs, it will not be seen any problems after the release is resumed within one month.
Therefore, in addition to optimizing the Memcache client and the SIP framework logic, the stress tests for Memory leakage started to be done. After locating and solving this problem, I feel that it is necessary to take a few steps for a large number of applications.
1. Multi-day testing under heavy pressure in the GC output environment. (You can add-verbose: gc-XX: + PrintGCDetails-XX: + HeapDumpOnOutOfMemoryError in JAVA_OPTS)
2. Check GC output logs to determine whether memory leakage exists. (Detailed instance descriptions are provided later)
3. in case of Memory leakage, jprofiler and other tools are used to troubleshoot memory leaks (the reason for this is that jprofiler and other tools have an impact on stress testing, making it difficult to cope with high pressure, it also makes the problem less easily exposed)
4. Solve the Problem and repeat Step 2.
Here, we will make a practical analysis and comparison of the GC logs for the stress testing of SIP in jdk1.5 and jdk1.6. We will roughly describe how to determine whether the application has memory leakage issues based on the output. (Here, the memory leakage problem is that there is a serious memory leakage in the mongoblockingqueue poll method in the jdk concurrent package written previously. The higher the call frequency, the more severe the memory leakage)
Both stress tests are about two days. The test plan is as follows:
The number of concurrent requests starts to 50. After each concurrent request is completed, the rest time is 0.1 seconds. After 10 minutes, the number of concurrent requests increases by 50. According to this rule, the number of concurrent requests increases to 500.
The old version of SIP is a stress test completed in the JDK1.5 environment,
The JDK version of the new version of SIP is 1.6,
The same as before, the press is 10.2.226.40, DELL1950, 8 CPU, 8 GB memory.
The press simulates the constant call request to a signed API.
Let's take a look at the specific content of the two logs (the content is mostly truncated for analysis)
First, let's talk about the log output structure: (1.6 and 1.5 are slightly different, but 1.6 is more detailed in time statistics)
[GC [<collector >:< starting occupancy1>-> <ending occupancy1>, <pause time1> secs] <starting occupancy3>-> <ending occupancy3>, <pause time3> secs]
<Collector> GC collector Name
<Starting occupancy1> memory occupied by the new generation before GC
<Ending occupancy1> memory occupied by the new generation after GC
<Pause time1> specifies the time when the jvm Suspends processing for the new generation of local data collection.
<Starting occupancy3> memory occupied by JVM Heap before GC
<Ending occupancy3> memory occupied by JVM Heap after GC
<Pause time3> total time for jvm to suspend processing during GC
Jdk1.5 log:
GC output at startup:
[GC [DefNew: 209792 K-> 4417 K (235968 K), 0.0201630 secs] 246722 K-> 41347 K (498112 K), 0.0204050 secs]
[GC [DefNew: 214209 K-> 4381 K (235968 K), 0.0139200 secs] 251139 K-> 41312 K (498112 K), 0.0141190 secs]
Output:
The first 209792 K of New Generation recovery, 4417 K after recovery, 205375 K after recovery, 41347 K after total recovery of Heap, 205375 K after recovery of 246722K, total recovery of K. This indicates that 100% of the records are withdrawn, and no new generation objects are promoted to the M era or permanent zone (the name is not necessarily accurate, but only indicates the meaning ).
Output of the second sentence:
According to the analysis, only 1 k of content was promoted to the Middle Ages.
After running for a period of time:
[GC [DefNew: 210686 K-> 979 K (235968 K), 0.0257140 secs] 278070 K-> 68379 K (498244 K), 0.0261820 secs]
[GC [DefNew: 210771 K-> 1129 K (235968 K), 0.0275160 secs] 278171 K-> 68544 K (498244 K), 0.0280050 secs]
Output of the first sentence:
The first 210686 K of New Generation recovery, 979 K after recovery, 209707 K after recovery, 68379 K after total recovery of Heap, 209691 K after recovery of K, total recovery of K. This indicates that 16 K is not recycled.
Output of the second sentence:
The first 210771 K of New Generation recovery, 1129 K after recovery, 209642 K after recovery, 68544 K after total recovery of Heap, 209627 K after recovery of 278171K, total recovery of K. This indicates that 15 K is not recycled.
Comparing the memory usage and Heap usage of the new generation at startup and now, we found that the usage of Heap increased significantly, while that of the new generation did not, while the total usage of Heap increased by 27 MB, this indicates that there may be Memory leakage. Although each leak contains a small number of bytes, the frequency is very high, and most leaked objects are upgraded to the generational or persistent generation.
After another period of time:
[GC [DefNew: 211554 K-> 1913 K (235968 K), 0.0461130 secs] 350102 K-> 140481 K (648160 K), 0.0469790 secs]
[GC [DefNew: 211707 K-> 2327 K (235968 K), 0.0546170 secs] 350275 K-> 140921 K (648160 K), 0.0555070 secs]
Output of the first sentence:
The first 211554 K of New Generation recovery, 1913 K of recovery, 209641 K of recovery, 140481 K of total Heap recovery, 209621 K of total recovery, K of total recovery. This indicates that 20 K is not recycled.
From the analysis, we can see that each leaked memory is only 10 KB, but the memory leakage is very obvious under a long period of stress testing, at this time, Heap has increased to 140 M, which is 100 M higher than that at startup. GC takes longer and longer time.
Subsequent phenomena:
After observing the log, we will find that the Full GC frequency is getting higher and higher, and the collection time is getting longer and longer. (Full GC will be executed on a regular basis, and will be executed if partial collection cannot meet the allocation requirements ).
[Full GC [Tenured: 786431 K-> 786431 K (786432 K), 3.4882390 secs] 1022399 K-> 1022399 K (1022400 K), [Perm: 36711 K-> 36711 K (98304 K)], 3.4887920 secs]
Java. lang. OutOfMemoryError: Java heap space
Dumping heap to java_pid771_hprof...
This statement indicates that the memory is actually exhausted.
Jdk1.6 log:
 
GC output at startup:
[GC [PSYoungGen: 221697 K-> 31960 K (229376 K)] 225788 K-> 36051 K (491520 K), 0.0521830 secs] [Times: user = 0.26 sys = 0.05, real = 0.05 secs]
[GC [PSYoungGen: 228568 K-> 32752 K (229376 K)] 232659 K-> 37036 K (491520 K), 0.0408620 secs] [Times: user = 0.21 sys = 0.02, real = 0.04 secs]
Output of the first sentence:
The first 221697 K of New Generation recovery, 31960 K after recovery, 189737 K after recovery, and 36051 K after 225788K of total Heap recovery, totaling 189737 K. 100% is recycled.
Output after running for a period of time:
[GC [PSYoungGen: 258944 K-> 2536 K (259328 K)] 853863 K-> 598135 K (997888 K), 0.0471620 secs] [Times: user = 0.15 sys = 0.00, real = 0.05 secs]
[GC [PSYoungGen: 259048 K-> 2624 K (259328 K)] 854647 K-> 598907 K (997888 K), 0.0462980 secs] [Times: user = 0.16 sys = 0.02, real = 0.04 secs]
Output of the first sentence:
The first 258944 K of New Generation recovery, 2536 K of recovery, 256408 K of recovery, 598135 K of total Heap recovery before 853863 K recovery, and 255728 K of total recovery. K is not recycled, but this does not mean Memory leakage. At the same time, we can see that the GC recovery time has not increased.
Output after running for a period of time:
[GC [PSYoungGen: 258904 K-> 2488 K (259264 K)] 969663 K-> 713923 K (1045696 K), 0.0485140 secs] [Times: user = 0.16 sys = 0.01, real = 0.04 secs]
[GC [PSYoungGen: 258872 K-> 2448 K (259328 K)] 970307 K-> 714563 K (1045760 K), 0.0473770 secs] [Times: user = 0.16 sys = 0.01, real = 0.05 secs]
Output of the first sentence:
The first 258904 K of the New Generation recovery, 2488 K after the recovery, 256416 K after the recovery, 713923 K after the total recovery of Heap, 255740 K after the recovery of K, the total recovery of K. 676K is not recycled, and the total Heap is also increased.
At this time, it looks like the situation is the same as that of 1.5. However, we can see that the Full GC execution is still performed once every-times, so we can continue to observe.
Output after running for more than one day:
[GC [PSYoungGen: 257016 K-> 3304 K (257984 K)] 1019358 K-> 766310 K (1044416 K), 0.0567120 secs] [Times: user = 0.18 sys = 0.01, real = 0.06 secs]
[GC [PSYoungGen: 257128 K-> 2920 K (258112 K)] 1020134 K-> 766622 K (1044544 K), 0.0549570 secs] [Times: user = 0.19 sys = 0.00, real = 0.05 secs]
We can find that the growth of Heap slows down.
Output after two days of running:
[GC [PSYoungGen: 256936 K-> 3584 K (257792 K)] 859561 K-> 606969 K (1044224 K), 0.0565910 secs] [Times: user = 0.18 sys = 0.01, real = 0.06 secs]
[GC [PSYoungGen: 256960 K-> 3368 K (257728 K)] 860345 K-> 607445 K (1044160 K), 0.0553780 secs] [Times: user = 0.18 sys = 0.01, real = 0.06 secs]
The Heap is reduced, and the memory leakage problem can be preliminarily ruled out. (In fact, in the jdk1.6 environment, we use jprofiler to observe the memory leakage point of concurrent. We found that the memory will continue to grow, but it will still be recycled after a while, therefore, we can partially explain the previous situation)
Summary:
Observe GC output in two dimensions. One is vertical comparison, that is, one recovery observation of Memory changes. One is horizontal comparison. For the comparison of memory allocation usage for a long period of time, this part requires a long period of observation and cannot be compared by just a few samples in a short period of time, because for sampling, the difference between Full GC and before and after GC, the difference in running duration, and the difference in instantaneous resource usage will affect judgment. At the same time, it is necessary to combine the Full GC occurrence time period, the time consumed by each GC collection as the auxiliary judgment standard.
By the way, you also need to pay attention to the settings of Heap YoungGen, OldGen, and PermGen. The larger the value, the longer the collection time, however, the frequency of executing Full GC may be relatively low, so we need to weigh it. These carefully understood GC basic design ideas will be more helpful, but the general use of the default is also good. In addition, you can configure some special GC, parallel, synchronization, and so on to make full use of multiple CPU resources.
The GC optimization can be done through many graphical tools now, or similar to the original analysis method I used, the advantage is that analysis can be performed at any time, anywhere, as long as you know the principle without using external tools. The original is always the best ^_^.

Author: ERDP Technical Architecture"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.