問題描述
一台M4000主機,作業系統是solaris10,上面的resin進程cpu佔用率過高,達到了70%,如下:
-bash-3.00$ ps -ef -o pid,pcpu,args|grep java
1511 0.1 /usr/java/bin/java -Dwebview.htdocs=/etc/opt/FJSVwvcnf/htdocs/FJSVwvbs -mx128m
2135 0.0 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m
15945 0.0 sh -c /svi/jdk150/jdk1.5.0_06/bin/java -server -Xms512m -Xmx3072m -XX:MaxPe
15946 70.7 /svi/jdk150/jdk1.5.0_06/bin/java -server -Xms512m -Xmx3072m -XX:MaxPermSize=
排查過程
1. 首先需要尋找cpu佔用率過高的LWP
-bash-3.00$ prstat -L -p 15946 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID 15946 slview 3336M 3301M sleep 15 0 3:56:27 2.2% java/49 15946 slview 3336M 3301M sleep 8 0 3:33:17 2.2% java/52 15946 slview 3336M 3301M sleep 12 0 3:32:20 2.2% java/50 15946 slview 3336M 3301M sleep 13 0 3:29:43 2.2% java/51 15946 slview 3336M 3301M sleep 13 0 3:30:54 2.2% java/47 15946 slview 3336M 3301M sleep 12 0 1:24:19 2.2% java/64 15946 slview 3336M 3301M sleep 15 0 1:07:55 2.1% java/144
2. 尋找LWP與java線程的對應關係
-bash-3.00$ pstack 15946|grep lwp----------------- lwp# 47 / thread# 47 -------------------- ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)----------------- lwp# 48 / thread# 48 -------------------- ff2c5cd0 lwp_cond_wait (1704928, 1704910, 0, 0) ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)----------------- lwp# 49 / thread# 49 -------------------- ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)----------------- lwp# 50 / thread# 50 -------------------- ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)----------------- lwp# 51 / thread# 51 -------------------- ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)----------------- lwp# 52 / thread# 52 -------------------- ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
3. used the jstack <pid> find the callstack of thread
$ jstack -m 15946 擷取所有線程的調用堆桟
hread t@50: (state = IN_VM) - java.lang.AbstractStringBuilder.expandCapacity(int) @bci=28, line=99 (Compiled frame; information may be imprecise) - per.xwnmp.flux.report.RptFluxHisQuery.GetFluxData(java.lang.String[], java.util.HashMap, java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String) @bci=480, line=509 (Interpreted frame) - per.xwnmp.flux.report.RptFluxHisQuery.GenFluxReport(java.lang.String, java.lang.String[], java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String) @bci=124, line=82 (Interpreted frame) - _nos._flux._flux._FluxPerfView_0Excel__jsp._jspService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=930, line=162 (Interpreted frame) - com.caucho.jsp.JavaPage.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=9, line=75 (Interpreted frame) - com.caucho.jsp.Page.subservice(com.caucho.server.http.CauchoRequest, com.caucho.server.http.CauchoResponse) @bci=214, line=506 (I - com.caucho.server.TcpConnection.run() @bci=73, line=139 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)Thread t@52: (state = IN_VM) - per.xwnmp.flux.report.RptFluxHisQuery.GetFluxData(java.lang.String[], java.util.HashMap, java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String) @bci=435, line=508 (Compiled frame; information may be imprecise) - per.xwnmp.flux.report.RptFluxHisQuery.GenFluxReport(java.lang.String, java.lang.String[], java.lang.String, java.lang.String, java.lang.String, java.lang.String, java.lang.String) @bci=124, line=82 (Interpreted frame) - _nos._flux._flux._FluxPerfView_0Excel__jsp._jspService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=930, line=162 (Interpreted frame) - com.caucho.jsp.JavaPage.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=9, line=75 (Interpreted frame) - com.caucho.jsp.Page.subservice(com.caucho.server.http.CauchoRequest, com.caucho.server.http.CauchoResponse) @bci=214, line=506 (Interpreted frame)