1.1 Purpose of writing
In order to facilitate the subsequent discovery of the process of suspended animation can be normal analysis and the first time to keep the scene snapshot.
1.2 Writing the background
Recently the server discovered that Tomcat's app occasionally had an inaccessible situation. After a period of observation, it was recently discovered that there was an inaccessible situation with a Tomcat application. Briefly describe the performance of the Tomcat at that time: The client request is not responding, the process of viewing the server-side Tomcat is alive, and when you view the business log, you find that the log stops without any recent access logs. Even the catalina.log under Tomcat did not have any access records, which basically concluded that Tomcat was no longer available for service.
2 Analysis steps
According to the phenomenon of suspended animation that I described earlier, I first thought about whether there was a problem with the network or not, and I began to analyze the data flow from the request, because the architecture of our business was based on the Nginx+tomcat cluster configuration. A request-up flow can be used to simply describe:
2.1 Checking Nginx Network conditions
Change Nginx configuration, let the Nginx request only go to the problem of this machine tomcat application, in the Access.log to see if there is a network request, the results can see all the current network requests, that is, can be excluded is the network problem.
2.2 Checking the network situation of Tomcat
Analyze the Tomcat access log xxxx.log on the business configuration for log access records, query the Tomcat application log for no access records at all, and because our deployment is native to the Tomcat application, it is possible to exclude network problems. To this basic can be concluded that the network is not a problem, Tomcat itself has a situation of suspended animation. In the Tomcat log there were reported outofmemoryerror anomalies, so it is certain that Tomcat was suspended from Oom.
3 Analyzing JVM Memory Overflow
3.1
Why a memory leak occurs
When we learn Java, we know that the most convenient place is that we do not need to manage the allocation and release of memory, everything is handled by the JVM itself, when the Java object is no longer applied, when the heap memory is not enough, the JVM will do GC processing, clear the heap memory space that these objects occupy, However, if the object has been applied, then the JVM cannot perform GC processing, and when we create a new object, the JVM will have no way to get enough memory from the heap to allocate to this object, which will cause oom. We have an oom cause, generally because we keep the object in the container, but the container does not have a corresponding size limit or removal mechanism, which can easily lead to oom.
3.2
Quick Positioning Issues
How can we locate the problem quickly when our application server is taking up too much memory? To quickly locate the problem, we first need to get a snapshot of the memory at some point in the server JVM. The JDK provides a number of corresponding commands such as: Jstack,jstat,jmap,jps and so on. We should keep the scene fast after the problem occurs.
3.2.1 Jstack
You can observe the operation of all current threads in the JVM and the current state of the thread.
sudo jstack-f process ID
The output reads as follows:
From the above figure we can see that there is no deadlock in the Tomcat process, and each thread handles the waiting state. At this point we can telnet to the port on Tomcat to see if the Tomcat process has a task response. There was no response from Tomcat to prove that the Tomcat application was unresponsive to handling suspended animation.
3.2.2 Jstat
This is a more important and useful command in the JDK command, which can be used to observe CLASSLOADER,COMPILER,GC related information
The specific parameters are as follows:
-class: Statistics class loader behavior information
-compile: Statistical compilation Behavior Information
-GC: Statistics on JDK GC time heap information
-gccapacity: Statistics on the corresponding heap capacity of different generations (including newborn, old, permanent)
-gccause: Statistical GC case, (same as-gcutil) and event that caused GC
-gcnew: When statistics GC, the Cenozoic situation
-gcnewcapacity: When statistics GC, the Cenozoic heap capacity
-gcold: Statistics GC, the situation of the elderly area
-gcoldcapacity: When statistics GC, the old area heap capacity
-gcpermcapacity: When statistics GC, permanent area heap capacity
-gcutil: When statistics GC, heap condition
-printcompilation: Don't know what to do, has been useless.
Some of the commonly used parameters are:
sudo jstat-class 2083 1000 10 (monitor once every 1 seconds, do 10 times altogether)
View the head situation at the time
sudo jstat-gcutil 20683 2000
Note: The graph is not an error intercept
The data intercepted at the time of occurrence is the GC has not been processed at all, because there is no log with full GC, so it is not certain that JVMGC time is too long, causing the application to pause.
3.2.3 Getting a memory snapshot
The Jmap from the JDK can get a snapshot of the time inside
Command: Jmap-dump:format=b,file=heap.bin <pid>
File: Save path and file name
PID: Process number (Windows through Task Manager view, Linux via PS aux view)
Dump files can be viewed through memoryanalyzer analysis, URL: http://www.eclipse.org/mat/, you can see the number of objects dump, memory consumption, thread conditions and so on.
It can be seen from the above figure that the object has no memory overflow.
From what we can clearly see the hashmap memory usage of this project is higher, because our system is to return the data structure of the map, it is normal to take up high memory.
3.2.4 observing the physical memory usage of the JVM in operation
Observe the physical memory usage of the JVM in operation. We can also use the Jmap command
The parameters are as follows:
-heap : Print JVM heap condition
-histo : Prints the histogram of the JVM heap. Its output information includes the class name, the number of objects, and the size of the object.
-histo : Live : same as above, but only promise to survive
-permstat : Print permanent generation heap condition
command to use:
Jmap-heap 2083
You can observe the memory usage of the new Generation (Eden space,from space,to Space), tenured generation,perm Generation
Output content:
For Tomcat to apply the configuration information of the JVM prior to the error, you can clearly see the information at that time:
Maxheapsize heap Memory Size: 3500M
Maxnewsize Cenozoic Memory Size: 512M
PermSize permanent memory Size: 192M
Newratio sets the ratio of the younger generation (including Eden and the two survivor Zone) to the older generation (except for the persistent generation). Set to 2, the ratio of the young generation to the old generation is 1:2, and the younger generation takes up 1/3 of the entire stack.
Survivorratio sets the ratio between Eden and survivor in the younger generation. Set to 8, the ratio of two survivor to one Eden area is 2:8, and a survivor area represents 1/10 of the entire young generation.
In new generation, there is a space called Eden, which is mainly used for storing new objects, and two survivorspaces (From,to), which are used to store objects that survive each garbage collection. In old Generation, a memory object with a long life cycle in the application, and a permanent Generation, are used primarily to put the JVM's own reflective objects, such as class objects and method objects.
As can be seen from the above diagram, the new generation of the JVM is too small, it can be seen that the application of the Cenozoic region is fully occupied, no further to the Cenozoic region to add the object at this time these objects are active, so will not be processed by the GC, but the Tomcat application continues to produce new objects, This can lead to the occurrence of Oom, which is the cause of the death of Tomcat.
4 Tomcat suspended animation other cases
The following is an online information about the case of Tomcat fake:
1 , the application of its own program problems, resulting in deadlocks.
2 , Load too high to exceed the limits of service
3 , JVM GC time is too long, causing the app to pause
because there is no GC processing in the item, I am not sure if this is one of the reasons why Tomcat was suspended in my project.
4 , a large number TCP Connection close_wait
Netstat-n | awk '/^tcp/{++s[$NF]} END {for (a in S) print A, s[a]} '
Time_wait 48
Close_wait 2228
Established 86
The three commonly used states are: Established means communication, T
Tomcat suspended animation (turn)