Processing of a JVM process memory leak problem

Source: Internet
Author: User

When it is known that a problem is caused by a memory leak, the problem is simple, the difficulty lies in the early identification work. A platform responsible for the international roaming business processing, the main business process is AMMs, running on the JVM, the problem starts with the following phenomenon:

1, the CPU of the AMMS process is abnormal, in the case of no change in business volume, the previous is 1% or less, before the business anomaly, soared to 20%

2, the process will not stop, nor hang, but the success rate of business processing decreased significantly, and other modules of the socket connection is abnormal

3, the AMMS process restarts, the business can be restored, but after a while, there are similar problems


The 3rd is very much like the phenomenon caused by the memory leak problem, so we start to go in this direction. (27833 for AMMS process number)

1, [[Email protected]/opt/roamware]jstat-gcutil 27833 1000
S0 S1 E O P ygc ygct FGC fgct GCT
0.00 0.00 97.14 100.00 98.59 34785 134.042 9473 6525.139 6659.182
0.00 0.00 98.23 100.00 98.59 34785 134.042 9473 6525.139 6659.182
0.00 0.00 99.71 100.00 98.59 34785 134.042 9473 6525.139 6659.182
0.00 0.00 36.63 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 39.33 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 40.92 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 42.45 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 44.54 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 46.66 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 48.38 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 50.64 100.00 98.59 34785 134.042 9474 6525.908 6659.950
0.00 0.00 52.19 100.00 98.59 34785 134.042 9474 6525.908 6659.950


Focus on FGC and FGCT, one is the number of full GC since the start of the program, and the other is the time spent per FULLGC (in MS)

As you can see, these two values are very large. Why does the JVM need to continue full GC? It is very likely that there is a memory leak. Of course, one more thing is that the memory set for the JVM is really too small, because the growth in business concurrency is not enough, but it has been ruled out earlier, that is, the business volume does not mutate.


2, in order to further determine the problem, you need to open AMMs GC log and dump switch, modify the AMMs startup items.

Before the modification:

Nohup java-dapp= "$APPNAME"-SERVER-XMX1024M-XMS1024M-XX:+USEPARALLELOLDGC
-xx:+useparallelgc
-xx:parallelgcthreads=8
COM.ROAMWARE.SDS2.FSMLIB.FSMAPP &


After the modification:

Nohup java-dapp= "$APPNAME"-SERVER-XMX1024M-XMS1024M-XX:+USEPARALLELOLDGC
-xx:+useparallelgc
-xx:parallelgcthreads=8
-verbose:gc
-xx:+printgcdetails
-xx:+printgcdatestamps-xloggc:/logs/tmp/gc.log
-xx:+heapdumponoutofmemoryerror
-xx:heapdumppath=/logs/tmp

COM.ROAMWARE.SDS2.FSMLIB.FSMAPP &


After a few days of operation, we can analyze the/logs/tmp/gc.log and have multiple dimensions to observe the Gc.log:

1) Observe the trend of heap occupancy after each young GC;

2015-03-04t15:41:10.458+0800:229192.000: [GC [psyounggen:347688k->2064k (347904K)] 592499k->247183K (1048320K), 0.0033389 secs] [Tim
es:user=0.02 sys=0.00, real=0.00 secs]
2015-03-04t15:41:42.476+0800:229224.018: [GC [psyounggen:347664k->2096k (347904K)] 592783k->247523K (1048320K), 0.0032800 secs] [Tim
es:user=0.02 sys=0.00, real=0.00 secs]

The change in the red font here is.

2) Observe the trend of heap occupancy after each full GC;

2014-09-26t19:30:35.237+0800:81312.973: [Full GC [psyounggen:2160k->0k (347584K)] [paroldgen:700192k-> 102156K (700416K)] 702352k-> 102156k (1048000K) [ pspermgen:33220k->32817k (65536K)], 0.3760642 secs] [times:user=1.52 sys=0.03, real=0.38 secs]
2014-09-27t17 : 34:34.580+0800:160751.869: [Full GC [psyounggen:2208k->0k (347328K)] [paroldgen:700222k->128688k (700416K)] 702430k-> 128688k (1047744K) [pspermgen:32938k-> 32844K (65536K)], 0.3050225 secs] [times:user=1.65 sys=0.00, real=0.31 secs]
2014-09-28t15:35:46.788+0800: 240023.635: [Full GC [psyounggen:2384k->0k (347584K)] [paroldgen:700261k->134673k (700416K)] 702645k->

3) Observe the frequency of the full GC (that is, a full GC is raised after every number of young GC) to aid in observing the time spent by the full GC


If these values are cumulative over a longer observation period, the likelihood of a memory leak is further confirmed.



3. Now that you know the memory leak, you can start to analyze the dump with tools. The option to increase the output dump (when memory is exhausted) for the AMMS process has been increased before, if you want to generate it manually, the command is as follows:

[[Email Protected]/opt/roamware]jmap-dump:format=b,file=/opt/roamware/amms_dump.bin 27833
Dumping Heap To/opt/roamware/amms_dump.bin ...
Heap dump file created

27833 of which are AMMs PID


Analysis tool recommended Memoryanalyzer, this tool has eclipse plug-in version, there is a standalone can be installed directly on the win version.

After analysis, it is possible to determine what data structures reside in the heap for a long time.

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/5A/1E/wKiom1T2wKqD4uDYAAcqTW8y_Rs657.jpg "title=" Heap2.jpg "alt=" Wkiom1t2wkqd4udyaacqtw8y_rs657.jpg "/>


As you can see here, there is a data stored in the Concurrenthashmapcache (FSM state machine) that resides in the JVM heap for a long time, up to 763M, and every production line. Then, try looking for information from the business log to see if a state machine is not being released properly.

Sure enough, I found a lot of "Saved to persistent store" log (normal should be release). The last discovery is due to a field in the message sent by the external network element that does not conform to the platform logic.


Summary:

For the production system, the memory leak problem is divided into internal and external triggering, this problem is external, this problem is more difficult to find in the version testing process.
















This article is from the "Memory Fragments" blog, so be sure to keep this source http://weikle.blog.51cto.com/3324327/1617277

Processing of a JVM process memory leak problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.