找出java代碼中佔用cpu過多問題
當有java進程佔用過多CPU時,可能是邏輯出現的問題。如何排查問題所在呢?
1. 使用top工具列出所有進程,shitf + p 列出CPU佔用率較高進程
2. 找到問題進程號,使用top -H -p pid列出進程的所有線程
3. 然後shift + p 按照CPU使用率排序
4. 找出問題進程號,使用python列印出其16進位值,print("0x" % ppid),比如是:76a3
5. jstack pid > t.dat 記錄線程堆棧,vi 開啟找到76a3的線程號,結合源碼定位問題
下面使用一個死迴圈的例子進行講解:
2015-07-26 19:52:04Full thread dump OpenJDK 64-Bit Server VM (20.0-b11 mixed mode):"Attach Listener" daemon prio=10 tid=0x00007fa04c001000 nid=0x7ac9 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE"DestroyJavaVM" prio=10 tid=0x00007fa070007000 nid=0x7697 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE"ttwo" prio=10 tid=0x00007fa0700b2000 nid=0x76a3 runnable [0x00007fa074260000] java.lang.Thread.State: RUNNABLEat Main$T2.run(Main.java:31)at java.lang.Thread.run(Thread.java:679)"tone" prio=10 tid=0x00007fa0700b0000 nid=0x76a2 waiting on condition [0x00007fa074361000] java.lang.Thread.State: TIMED_WAITING (sleeping)at java.lang.Thread.sleep(Native Method)at Main$T1.run(Main.java:19)at java.lang.Thread.run(Thread.java:679)"Low Memory Detector" daemon prio=10 tid=0x00007fa070098800 nid=0x76a0 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE"C2 CompilerThread1" daemon prio=10 tid=0x00007fa070096000 nid=0x769f waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" daemon prio=10 tid=0x00007fa070093800 nid=0x769e waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE"Signal Dispatcher" daemon prio=10 tid=0x00007fa070085000 nid=0x769d runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE"Finalizer" daemon prio=10 tid=0x00007fa070073000 nid=0x769c in Object.wait() [0x00007fa074973000] java.lang.Thread.State: WAITING (on object monitor)at java.lang.Object.wait(Native Method)- waiting on <0x00000000ec0b1310> (a java.lang.ref.ReferenceQueue$Lock)at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)- locked <0x00000000ec0b1310> (a java.lang.ref.ReferenceQueue$Lock)at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)"Reference Handler" daemon prio=10 tid=0x00007fa070071000 nid=0x769b in Object.wait() [0x00007fa074a74000] java.lang.Thread.State: WAITING (on object monitor)at java.lang.Object.wait(Native Method)- waiting on <0x00000000ec0b11e8> (a java.lang.ref.Reference$Lock)at java.lang.Object.wait(Object.java:502)at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)- locked <0x00000000ec0b11e8> (a java.lang.ref.Reference$Lock)"VM Thread" prio=10 tid=0x00007fa07006a800 nid=0x769a runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fa070011800 nid=0x7698 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fa070013800 nid=0x7699 runnable "VM Periodic Task Thread" prio=10 tid=0x00007fa07009b000 nid=0x76a1 waiting on condition JNI global references: 865
上面的代碼啟動兩個線程,線程T1會佔用少量CPU,線程T2會佔滿一個CPU。
首先我們通過top列出進程,按照cpu使用率排序(shift + p)
可以看出出問題的進程號是30358
然後使用 top -H -p 30358 得到如下結果,然後使用shitf + p按照cpu使用率排序
這裡可以看到線程為30371佔用太多CPU,此線程有問題
接下來使用jstack 30358 > t.dat 記錄線程堆棧
使用python列印出出30371的16進位值 (print("0x" % 30371結果為76a3
線上程堆棧中找到76a3進程,然後仔細查看堆棧資訊
這裡可以看到ttwo線程名,運行到了Main的31行,此時我們在去原始碼中仔細查看附近的邏輯,問題一目瞭然