JVM memory GC scam

Last Update:2018-10-30 Source: Internet

Author: User

Tags in degrees

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article has been published by the author Yao piaohai to authorize the Netease cloud community.

Welcome to the Netease cloud community to learn more about the operation experience of Netease technology products.

Overview

In daily program development, many Java programmers are not very concerned about memory usage. Of course, if the programmer is lucky or the system is not tested on a large scale or used by the user, this problem may never occur, making the programmer always think that the memory is infinite anyway, it can be used all the time. Indeed, the JVM garbage collector will help us deal with all the things, but if luck is not so good, unfortunately it may happen to us, for example, the process will throw an OOM exception, no longer receives new requests; response times out or does not respond within a fixed period of time, and CPU usage is often like a roller coaster. The memory usage can work normally during most of the working hours, which may lead to unknown memory usage of Java applications or insufficient performance tests, leading to program failure. In the above situation, programmers usually find problems quickly or summarize certain rules.

Problem

Sometimes the JVM will fool you. The JVM keeps garbage collection, but the heap is full after each collection. It is obvious that the program memory is used up, but the JVM does not throw the outofmemoryerror (OOM) exception to tell the programmer What Is Going On internally. It just keeps doing good guys and trying to help us with garbage collection, the server's resources are exhausted, but the server is no longer able to respond to users' normal requests. Let's take a look at these situations and feel cheated.

Status quo:

After simulating that a user repeatedly sends a request to a system, a colleague suddenly reports a test case request failure by email on the system after running for a period of time and logs on to the server of the test system, first, let's take a look at the JVM parameter settings as follows:

-Server-xms4g-xmx4g-XX: maxpermsize = 256 m-verbose: GC-XX: + printgcdetails-xloggc: $ catalina_base/logs/GC. log-XX: + printgctimestamp, and then use the top command to see what happened to the server.

After a period of observation, the CPU has been running at 100%, so it is assumed that the program may trigger a bug, it may be a regular expression or an endless loop in a code segment that fails to be found. Isn't that easy? You can directly use jstack + PID to open the stack and directly operate on it. operation logs are output immediately on the interface. Because there are too many logs and other logs are similar, just take the following parts:

From the above stack log, we can see that all the threads are blocked by blocked, and no business-related code can be found in the stack. Doesn't intuition make a mistake and it suddenly feels bad, but at least two reasons are not found. Well, let's look at the GC of the application. Some logs are as follows.

1403682.561: [GC [PSYoungGen: 1375104K->11376K(1386176K)] 4145665K->2782002K(4182400K), 0.0174410 secs] [Times: user=0.27 sys=0.00, real=0.02 secs]1407799.743: [GC [PSYoungGen: 1386160K->11632K(1386432K)] 4156786K->2793538K(4182656K), 0.0285330 secs] [Times: user=0.48 sys=0.00, real=0.03 secs]1409230.024: [GC [PSYoungGen: 1386416K->10688K(1377984K)] 4168322K->2803822K(4174208K), 0.0265000 secs] [Times: user=0.43 sys=0.00, real=0.02 secs]1409230.051: [Full GC [PSYoungGen: 10688K->7014K(1377984K)] [PSOldGen: 2793134K->2796224K(2796224K)] 2803822K->2803238K(4174208K) [PSPermGen: 48439K->48439K(262144K)], 7.8892780 secs] [Times: user=7.92 sys=0.00, real=7.89 secs]1410502.582: [Full GC [PSYoungGen: 1366336K->85344K(1377984K)] [PSOldGen: 2796224K->2796224K(2796224K)] 4162560K->2881568K(4174208K) [PSPermGen: 48577K->48577K(262144K)], 8.2720110 secs] [Times: user=8.29 sys=0.00, real=8.27 secs]

Explanations:

The first line:
1403682.561: [GC [psyounggen: 1375104 K-> 11376 K (1386176 K)] 4145665 K-> 2782002 K (4182400 K), 0.0174410 secs] [times: user = 0.27 sys = 0.00, real = 0.02 secs]

Occurrence time point: the duration of JVM running, measured in degrees. It can also be formatted as a fixed time format.

Psyounggen: the type of GC that occurs, which indicates that a young generation of GC occurs.

1375104 K: size before recycling

11376 K: recycled size

1386176 K: size of young generation

4145665 K: total usage before recycling

2782002 K: occupied size after recycling

4182400 K: total usage

0.27 and 0.00: indicates the CPU running time in user and system-like (sys ).

0.02 secs: indicates the actual GC running time

Note: The total running time is less than the total time of the user and system states, because the latter only refers to the CPU running time, including the waiting time or IO blocking time, in addition, GC is collected by multiple threads, and the machine also has multiple CPUs. Therefore, the sum of the two is greater than the preceding value, if the serial collector is used, the time between the two is almost the same. For the differences between various collectors, we will arrange a detailed summary later.

The following two rows will not be repeated. The fourth row will contain the "full" text, indicating that the JVM has a full GC, but there are two more partitions to collect. psoldgen: the size and total space of the old generation before and after recovery; pspermgen: The size and total space of the permanent generation before and after recovery. From the third row, we can see that the usage of the old space reaches saturation, which triggers full GC. However, unfortunately, after the fifth row, full GC occurs again, the subsequent operations are ongoing, but the system does not throw an OOM exception or exit the process. As a result, the service process of this machine has been present, but it is basically unable to work normally.

GC, whether young GC or full GC, will cause jvm stw (stop world) to suspend the user's business work every time to process the garbage collection task and fail to respond to user requests in a short time, in particular, a large number of full GC will lead to a reduction in the system response speed, and there is also a huge risk of oom. Young GC is frequent. Even if GC adopts multi-thread collection, although the collection time is very short, the impact on applications cannot be ignored if GC frequency and frequency are high. Full GC includes Garbage collection of the entire partition, including the new generation, old generation, and persistent generation. Therefore, the recovery costs are high, and the application will be suspended for a longer period of time and cannot respond to user requests in a timely manner. Therefore, pay special attention to this situation. In general, in addition to actively calling GC operations, JVM will have full GC in the following situations.

1. The memory of the old generation is insufficient.

2. Insufficient persistent generation memory

3. calculate that the average size of the new generation GC to the old generation is larger than the remaining space of the old generation GC.

Solution

After knowing the cause, you can use jmap-heap to directly view the image value of the JVM memory, or use jmap-dump to directly dump the JVM stack, use mat to open the analysis. If such an image occurs, the files dumped by dump will be large, and some will reach more than 10 Gb, because it is generally not directly performed on the work machine, you need to forward the file to other non-online services and have enough memory for machine analysis. You can use mat to open the analyzed file. The operation result is as follows:

The fourth row does not show the actual business-related information. The fifth to sixth rows can still be seen. Let's first look at the specific instances of the fourth row.

After opening, the home page will show suspicious suggestion object instances, Jump directly to the list, open the fold details to see the true face, including more than 0.3 million objects, find relevant personnel according to business needs, remove unnecessary instances directly after they are used up. You can solve the problem of several other rows.

Summary

From the above GC situation, the JVM continuously helps us perform GC operations and directly occupies all the CPU resources, however, if we don't directly throw an exception and tell us that the memory is not enough, we feel like we have taken us to a huge Ponzi scam. Maybe we will increase the JVM memory, this pitfall will also help us hide it. If the program is set to a scheduled restart or other operations, this pitfall will never be found. Generally, product developers hope that the application can discover this problem before the user detects it. The JVM cannot determine this problem and cannot throw almost OOM exceptions for us, however, you can adjust the gctimelimit and gcheapfreelimit parameters to redefine when to throw an outofmemoryerror. The default value of gctimelimit is 98%. That is to say, if 98% is spent on GC, an outofmemoryerror is thrown. Gcheapfreelimit indicates the size of the available heap after recycling. The default value is 2%. Of course, the best way is that the development engineer knows how to use the proper usage of the relevant container class at the beginning, and can be fully tested or run before going online. This article only references a specific GC security to explain how GC is deceiving, and how to promptly discover such problems with GC and JVM memory details, I have the opportunity to discuss with you through examples.

Note: For more information, see hotspot VM 1.7.65.

References:

JVM http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html

Hotspot JVM is a Ponzi scam http://it.deepinmind.com/gc/2014/04/01/hotspot-jvm-ponzi-scheme.html

Java Memory Leak analysis http://doc.hz.netease.com/pages/viewpage.action? Pageid = 36468038

Free trial of cloud security (yundun) content security, verification code and other services

For more information about Netease technologies, products, and operations, click.

Related Articles:
[Recommended] migrate SVN to git

JVM memory GC scam

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More