這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。
Stack trace是指堆棧回溯資訊,在目前時間,以當前方法的執行點開始,回溯調用它的方法的方法的執行點,然後繼續回溯,這樣就可以跟蹤整個方法的調用,大家比較熟悉的是JDK所帶的jstack
工具,可以把Java的所有線程的stack trace都列印出來。
它有什麼用呢?用處非常的大,當應用出現一些狀況的時候,比如某個模組不執行, 鎖競爭、CPU佔用非常高等問題, 又沒有足夠的log資訊可以分析,那麼可以查看stack trace資訊,看看線程都被阻塞或者運行在那些代碼上,然後定位問題所在。
對於Go開發的程式,有沒有類似jstack
這樣的利器呢,目前我還沒有看到,但是我們可以通過其它途徑也很方便的輸出goroutine的stack trace資訊。
本文介紹了幾種方法,尤其是最後介紹的方法比較有用。
異常退出情況下輸出stacktrace
通過panic
如果應用中有沒recover的panic,或者應用在啟動並執行時候出現運行時的異常,那麼程式自動會將當前的goroutine的stack trace列印出來。
比如下面的程式,如果你運行會拋出一個panic。
1234567891011121314151617181920212223242526 |
package mainimport ("time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {panic("panic from m3")}func a() {time.Sleep(time.Hour)} |
輸出下面的stack trace:
123456789101112131415 |
dump go run p.gopanic: panic from m3goroutine 1 [running]:panic(0x596a0, 0xc42000a1a0)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/panic.go:500 +0x1a1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:21 +0x6dmain.m2()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:17 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:13 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:9 +0x3aexit status 2 |
從這個資訊中我們可以看到p.go的第9行是main方法,它在這一行調用m1方法,m1方法在第13行調用m2方法,m2方法在第17行調用m3方法,m3方法在第21出現panic, 它們運行在goroutine 1中,當前goroutine 1的狀態是running狀態。
如果想讓它把所有的goroutine資訊都輸出出來,可以設定 GOTRACEBACK=1
:
1234567891011121314151617181920212223 |
GOTRACEBACK=1 go run p.gopanic: panic from m3goroutine 1 [running]:panic(0x596a0, 0xc42000a1b0)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/panic.go:500 +0x1a1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:21 +0x6dmain.m2()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:17 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:13 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:9 +0x3agoroutine 4 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.a()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:25 +0x30created by main.main/Users/yuepan/go/src/github.com/smallnest/dump/p.go:8 +0x35exit status 2 |
同樣你也可以分析這個stack trace的資訊,得到方法調用點的情況,同時這個資訊將兩個goroutine的stack trace都列印出來了,而且goroutine 4的狀態是sleep狀態。
Go官方文檔對這個環境變數有介紹:
The GOTRACEBACK variable controls the amount of output generated when a Go program fails due to an unrecovered panic or an unexpected runtime condition. By default, a failure prints a stack trace for the current goroutine, eliding functions internal to the run-time system, and then exits with exit code 2. The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time. GOTRACEBACK=none omits the goroutine stack traces entirely. GOTRACEBACK=single (the default) behaves as described above. GOTRACEBACK=all adds stack traces for all user-created goroutines. GOTRACEBACK=system is like “all” but adds stack frames for run-time functions and shows goroutines created internally by the run-time. GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump. For historical reasons, the GOTRACEBACK settings 0, 1, and 2 are synonyms for none, all, and system, respectively. The runtime/debug package's SetTraceback function allows increasing the amount of output at run time, but it cannot reduce the amount below that specified by the environment variable. See https://golang.org/pkg/runtime/debug/#SetTraceback.
你可以設定 none
、all
、system
、single
、crash
,曆史原因, 你可以可是設定數字0
、1
、2
,分別代表none
、all
、system
。
通過SIGQUIT訊號
如果程式沒有發生panic,但是程式有問題,"假死“不工作,我們想看看哪兒出現了問題,可以給程式發送SIGQUIT
訊號,也可以輸出stack trace資訊。
比如下面的程式:
1234567891011121314151617181920212223242526 |
package mainimport ("time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)} |
你可以運行kill -SIGQUIT <pid>
殺死這個程式,程式在退出的時候輸出strack trace。
正常情況下輸出stacktrace
上面的情況是必須要求程式退出才能列印出stack trace資訊,但是有時候我們只是需要跟蹤分析一下程式的問題,而不希望程式中斷運行。所以我們需要其它的方法來執行。
你可以暴漏一個命令、一個API或者監聽一個訊號,然後調用相應的方法把stack trace列印出來。
列印出當前goroutine的 stacktrace
通過debug.PrintStack()
方法可以將當前所在的goroutine的stack trace列印出來,如下面的程式。
12345678910111213141516171819202122232425262728 |
package mainimport ("runtime/debug""time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {debug.PrintStack()time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)} |
列印出所有goroutine的 stacktrace
可以通過pprof.Lookup("goroutine").WriteTo
將所有的goroutine的stack trace都列印出來,如下面的程式:
1234567891011121314151617181920212223242526272829 |
package mainimport ("os""runtime/pprof""time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)} |
較完美的輸出 stacktrace
你可以使用runtime.Stack
得到所有的goroutine的stack trace資訊,事實上前面debug.PrintStack()
也是通過這個方法獲得的。
為了更方便的隨時的得到應用所有的goroutine的stack trace資訊,我們可以監聽SIGUSER1
訊號,當收到這個訊號的時候就將stack trace列印出來。發送訊號也很簡單,通過kill -SIGUSER1 <pid>
就可以,不必擔心kill
會將程式殺死,它只是發了一個訊號而已。
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849 |
package mainimport ("fmt""os""os/signal""runtime""syscall""time")func main() {setupSigusr1Trap()go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)}func setupSigusr1Trap() {c := make(chan os.Signal, 1)signal.Notify(c, syscall.SIGUSR1)go func() {for range c {DumpStacks()}}()}func DumpStacks() {buf := make([]byte, 16384)buf = buf[:runtime.Stack(buf, true)]fmt.Printf("=== BEGIN goroutine stack dump ===\n%s\n=== END goroutine stack dump ===", buf)} |
輸出結果很直觀,方便檢查。
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 |
=== BEGIN goroutine stack dump ===goroutine 36 [running]:main.DumpStacks()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:47 +0x77main.setupSigusr1Trap.func1(0xc420070060)/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:40 +0x73created by main.setupSigusr1Trap/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:42 +0xecgoroutine 1 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:28 +0x30main.m2()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:24 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:20 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:16 +0x3fgoroutine 34 [syscall]:os/signal.signal_recv(0xff280)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/sigqueue.go:116 +0x157os/signal.loop()/usr/local/Cellar/go/1.7.4/libexec/src/os/signal/signal_unix.go:22 +0x22created by os/signal.init.1/usr/local/Cellar/go/1.7.4/libexec/src/os/signal/signal_unix.go:28 +0x41goroutine 35 [select, locked to thread]:runtime.gopark(0xb5cc8, 0x0, 0xab3ef, 0x6, 0x18, 0x2)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/proc.go:259 +0x13aruntime.selectgoImpl(0xc42008d730, 0x0, 0x18)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/select.go:423 +0x11d9runtime.selectgo(0xc42008d730)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/select.go:238 +0x1cruntime.ensureSigM.func1()/usr/local/Cellar/go/1.7.4/libexec/src/runtime/signal1_unix.go:304 +0x2d1runtime.goexit()/usr/local/Cellar/go/1.7.4/libexec/src/runtime/asm_amd64.s:2086 +0x1goroutine 37 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.a()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:32 +0x30created by main.main/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:15 +0x3a=== END goroutine stack dump === |
當然,這段代碼也不是我原創的,這是docker程式碼程式庫中的一段代碼,很簡單,也很強大。
參考文檔
- http://stackoverflow.com/questions/19094099/how-to-dump-goroutine-stacktraces
- https://golang.org/pkg/runtime/