monitoring detects the background management system based on the Tomcat application can not access the page, sent a warning message, the problem arises, the first thing to do -restart the application. That's right! is not a barrier, but first restart the application, online product problems First principle, no matter what problem to try to recover in the first time, and restart the application is often the most effective means!
Restart , the background management system can be used normally, the heart even put down, the general situation is a small problem can be restored, if not restart it? Can not recover, operation and maintenance personnel this will be emergency troubleshooting, if you try to stop or predict ten minutes can not solve, then quickly follow the disaster recovery manual emergency replacement Restore it, don't tell me there is no rapid disaster tolerance plan ....
back to the subject, in the business temporarily back to normal, do not disregard, we must find out the cause of the problem, there is no reason to cause the business strike, it is just you did not see it.
share the next landlord's troubleshooting steps, for reference only:
1. troubleshoot errors caused by code
Check tomcat logs, found the error message, but with tomcat socket error, from this error message is basically excluded is the development of code problems, in fact, there are many times the development of code will Tomcat Run to death, what memory overflow is a little thing, good memory GCC and heapdump on it.
clientabortexception: java.net.sockettimeoutexception Atorg.apache.catalina.connector.OutputBuffer.realWriteBytes (outputbuffer.java:369) atorg.apache.tomcat.util.buf.bytechunk.append (bytechunk.java:368) atorg.apache.catalina.connector.outputbuffer.writebytes (outputbuffer.java:392) atorg.apache.catalina.connector.outputbuffer.write (OutputBuffer.java:381) atorg.apache.catalina.connector.coyoteoutputstream.write ( coyoteoutputstream.java:89) at Org.apache.catalina.connector.CoyoteOutputStream.write (coyoteoutputstream.java:83) atcom.qhfax.thrivefa.web.filecontroller.showimage (filecontroller.java:60) atcom.qhfax.thrivefa.common.basecontroller.showimage (BaSECONTROLLER.JAVA:54) at Sun.reflect.GeneratedMethodAccessor40.invoke (Unknownsource) Atsun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:25) at java.lang.reflect.method.invoke (method.java:597) atorg.springframework.web.bind.annotation.support.handlermethodinvoker.invokehandlermethod ( handlermethodinvoker.java:175) Atorg.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod ( annotationmethodhandleradapter.java:446) at Org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle ( annotationmethodhandleradapter.java:434) Atorg.springframework.web.servlet.DispatcherServlet.doDispatch (Dispatcherservlet.java:938) Atorg.springframework.web.servlet.DispatcherServlet.doService (dispatcherservlet.java:870) at org.springframework.web.servlet.frameworkservlet.processrequest ( frameworkservlet.java:961) Atorg.springframework.web.servlet.FrameworkServlet.doGet (frameworkservlet.java:852) atjavax.servlet.http.httpservlet.service (httpservlet.java:617) at org.springframework.web.servlet.frameworkservlet.service (FrameworkServlet.java:837) atjavax.servlet.http.httpservlet.service (HttpServlet.java:717) atorg.apache.catalina.core.applicationfilterchain.internaldofilter ( applicationfilterchain.java:290) at Org.apache.catalina.core.ApplicationFilterChain.doFilter (applicationfilterchain.java:206) Atorg.springframework.web.filter.CharacterEncodingFilter.doFilterInternal (characterencodingfilter.java:88) atorg.springframework.web.filter.onceperrequestfilter.dofilter ( onceperrequestfilter.java:107) Atorg.apache.catalina.core.ApplicationFilterChain.internalDoFilter (applicationfilterchain.java:235) atorg.apache.catalina.core.applicationfilterchain.dofilter ( applicationfilterchain.java:206) at Org.apache.catalina.core.StandardWrapperValve.invoke (standardwrappervalve.java:233) atorg.apache.catalina.core.standardcontextvalve.invoke (standardcontextvalve.java:191) atorg.apache.catalina.core.standardhostvalve.invoke (StandardHostValve.java : 127) atorg.apache.catalina.valves.errorreportvalve.invoke (errorreportvalve.java:102) atorg.apache.catalina.core.standardenginevalve.invoke (standardenginevalve.java:109) atorg.apache.catalina.connector.coyoteadapter.service (CoyoteAdapter.java : 293) at org.apache.coyote.http11.http11nioprocessor.process ( http11nioprocessor.java:889) Atorg.apache.coyote.http11.http11nioprotocol$http11connectionhandler.process (Http11NioProtocol.java:744) atorg.apache.tomcat.util.net.nioendpoint$socketprocessor.run ( nioendpoint.java:2274) atjava.util.concurrent.threadpoolexecutor$ Worker.runtask (threadpoolexecutor.java:886) Atjava.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:908)    &NBsp; at java.lang.thread.run (thread.java:662) caused by: java.net.sockettimeoutexception Atorg.apache.tomcat.util.net.NioBlockingSelector.write (nioblockingselector.java:123) atorg.apache.tomcat.util.net.nioselectorpool.write (nioselectorpool.java:156) atorg.apache.coyote.http11.internalniooutputbuffer.writetosocket ( internalniooutputbuffer.java:460) Atorg.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer (internalniooutputbuffer.java:800) atorg.apache.coyote.http11.internalniooutputbuffer.addtobb ( internalniooutputbuffer.java:644) at org.apache.coyote.http11.internalniooutputbuffer.access$000 (internalniooutputbuffer.java:46) atorg.apache.coyote.http11.internalniooutPutbuffer$socketoutputbuffer.dowrite (internalniooutputbuffer.java:825) atorg.apache.coyote.http11.filters.identityoutputfilter.dowrite (identityoutputfilter.java:118) atorg.apache.coyote.http11.internalniooutputbuffer.dowrite ( internalniooutputbuffer.java:610) at Org.apache.coyote.Response.doWrite (response.java:560) Atorg.apache.catalina.connector.OutputBuffer.realWriteBytes (outputbuffer.java:364) ... 38 more
2. the significance of the monitoring
At the same time hurriedly check all kinds of alarm text messages and mails, to exclude the health care of web monitoring, there is really harvest:
Alarm Host: **********
Alarm Time: ********
Alarm level: Warning
Alarm information: Too many processes on ******
Warning Item: proc.num[]
Question details: number of processes:106
Current Status: ok:106
Event id:16294
Then look at the monitoring system:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/70/0C/wKiom1WwVlHg2oOFAAD2Vf_HqK4218.jpg "title=" 001. PNG "alt=" wkiom1wwvlhg2oofaad2vf_hqk4218.jpg "/>
This is to seize the culprit of the small tail, and initially found the cause of the business to get down on the suspect so the next step in the console accurate positioning is the trouble-making.
3. method of Troubleshooting
Finding is the cause of too many processes, then the solution is too simple
Pstree Find out which process is too high a look, the amount, the original is a regular task of the pit ...
Application of Pkill batch End timer task
4. accurate analysis of the cause of failure
Alarm to see the knowledge process too much, this and the Tomcat has nothing to do with the food, it depends on the system log, this time the log centralized management and the role of visualization is reflected.
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/70/09/wKioL1WwWEXRpSCFAAHpN9VjIpI205.jpg "title=" 002. PNG "alt=" wkiol1wwwexrpscfaahpn9vjipi205.jpg "/>
5. Knowledge points
The Linux kernel has a mechanism called OOM killer(out-of-memory killer), which monitors those processes that consume too much memory, especially those that consume large amounts of memory instantly, in order to prevent Memory runs out and the kernel kills the process. Typical situation is: one day a machine suddenly ssh telnet, but can Ping, that is not the network fault, because the sshd process was OOM killer Killed (many times, such as the situation of suspended animation). Viewing the system log after restarting the machine/var/log/messages will find an out -of-Memory:kill process 1865(sshd) similar error message.
It is important to note that:
1.OOM not suitable for resolving memory leak problems.
2. Sometimes free view also has sufficient memory, but still triggers oom, because the process may occupy a special memory address space.
So don't expect Oom to be able to prevent memory overflow, and free can not find the problem of Oom, operations partners, enterprise-level monitoring and log centralized management is your eyes!
This article is from the "Ops Road" blog, please be sure to keep this source http://vekergu.blog.51cto.com/9966832/1677437
5 minutes wrong--Save the Tomcat with the lying gun