調試利器:dump goroutine 的 stacktrace

來源:互聯網
上載者:User
這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。

Stack trace是指堆棧回溯資訊,在目前時間,以當前方法的執行點開始,回溯調用它的方法的方法的執行點,然後繼續回溯,這樣就可以跟蹤整個方法的調用,大家比較熟悉的是JDK所帶的jstack工具,可以把Java的所有線程的stack trace都列印出來。

它有什麼用呢?用處非常的大,當應用出現一些狀況的時候,比如某個模組不執行, 鎖競爭、CPU佔用非常高等問題, 又沒有足夠的log資訊可以分析,那麼可以查看stack trace資訊,看看線程都被阻塞或者運行在那些代碼上,然後定位問題所在。

對於Go開發的程式,有沒有類似jstack這樣的利器呢,目前我還沒有看到,但是我們可以通過其它途徑也很方便的輸出goroutine的stack trace資訊。

本文介紹了幾種方法,尤其是最後介紹的方法比較有用。

異常退出情況下輸出stacktrace

通過panic

如果應用中有沒recover的panic,或者應用在啟動並執行時候出現運行時的異常,那麼程式自動會將當前的goroutine的stack trace列印出來。

比如下面的程式,如果你運行會拋出一個panic。

1234567891011121314151617181920212223242526
package mainimport ("time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {panic("panic from m3")}func a() {time.Sleep(time.Hour)}

輸出下面的stack trace:

123456789101112131415
dump go run p.gopanic: panic from m3goroutine 1 [running]:panic(0x596a0, 0xc42000a1a0)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/panic.go:500 +0x1a1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:21 +0x6dmain.m2()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:17 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:13 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:9 +0x3aexit status 2

從這個資訊中我們可以看到p.go的第9行是main方法,它在這一行調用m1方法,m1方法在第13行調用m2方法,m2方法在第17行調用m3方法,m3方法在第21出現panic, 它們運行在goroutine 1中,當前goroutine 1的狀態是running狀態。

如果想讓它把所有的goroutine資訊都輸出出來,可以設定 GOTRACEBACK=1:

1234567891011121314151617181920212223
GOTRACEBACK=1 go run p.gopanic: panic from m3goroutine 1 [running]:panic(0x596a0, 0xc42000a1b0)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/panic.go:500 +0x1a1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:21 +0x6dmain.m2()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:17 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:13 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:9 +0x3agoroutine 4 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.a()/Users/yuepan/go/src/github.com/smallnest/dump/p.go:25 +0x30created by main.main/Users/yuepan/go/src/github.com/smallnest/dump/p.go:8 +0x35exit status 2

同樣你也可以分析這個stack trace的資訊,得到方法調用點的情況,同時這個資訊將兩個goroutine的stack trace都列印出來了,而且goroutine 4的狀態是sleep狀態。

Go官方文檔對這個環境變數有介紹:

The GOTRACEBACK variable controls the amount of output generated when a Go program fails due to an unrecovered panic or an unexpected runtime condition. By default, a failure prints a stack trace for the current goroutine, eliding functions internal to the run-time system, and then exits with exit code 2. The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time. GOTRACEBACK=none omits the goroutine stack traces entirely. GOTRACEBACK=single (the default) behaves as described above. GOTRACEBACK=all adds stack traces for all user-created goroutines. GOTRACEBACK=system is like “all” but adds stack frames for run-time functions and shows goroutines created internally by the run-time. GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump. For historical reasons, the GOTRACEBACK settings 0, 1, and 2 are synonyms for none, all, and system, respectively. The runtime/debug package's SetTraceback function allows increasing the amount of output at run time, but it cannot reduce the amount below that specified by the environment variable. See https://golang.org/pkg/runtime/debug/#SetTraceback.

你可以設定 noneallsystemsinglecrash,曆史原因, 你可以可是設定數字012,分別代表noneallsystem

通過SIGQUIT訊號

如果程式沒有發生panic,但是程式有問題,"假死“不工作,我們想看看哪兒出現了問題,可以給程式發送SIGQUIT訊號,也可以輸出stack trace資訊。
比如下面的程式:

1234567891011121314151617181920212223242526
package mainimport ("time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)}

你可以運行kill -SIGQUIT <pid> 殺死這個程式,程式在退出的時候輸出strack trace。

正常情況下輸出stacktrace

上面的情況是必須要求程式退出才能列印出stack trace資訊,但是有時候我們只是需要跟蹤分析一下程式的問題,而不希望程式中斷運行。所以我們需要其它的方法來執行。

你可以暴漏一個命令、一個API或者監聽一個訊號,然後調用相應的方法把stack trace列印出來。

列印出當前goroutine的 stacktrace

通過debug.PrintStack()方法可以將當前所在的goroutine的stack trace列印出來,如下面的程式。

12345678910111213141516171819202122232425262728
package mainimport ("runtime/debug""time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {debug.PrintStack()time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)}

列印出所有goroutine的 stacktrace

可以通過pprof.Lookup("goroutine").WriteTo將所有的goroutine的stack trace都列印出來,如下面的程式:

1234567891011121314151617181920212223242526272829
package mainimport ("os""runtime/pprof""time")func main() {go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)}

較完美的輸出 stacktrace

你可以使用runtime.Stack得到所有的goroutine的stack trace資訊,事實上前面debug.PrintStack()也是通過這個方法獲得的。

為了更方便的隨時的得到應用所有的goroutine的stack trace資訊,我們可以監聽SIGUSER1訊號,當收到這個訊號的時候就將stack trace列印出來。發送訊號也很簡單,通過kill -SIGUSER1 <pid>就可以,不必擔心kill會將程式殺死,它只是發了一個訊號而已。

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
package mainimport ("fmt""os""os/signal""runtime""syscall""time")func main() {setupSigusr1Trap()go a()m1()}func m1() {m2()}func m2() {m3()}func m3() {time.Sleep(time.Hour)}func a() {time.Sleep(time.Hour)}func setupSigusr1Trap() {c := make(chan os.Signal, 1)signal.Notify(c, syscall.SIGUSR1)go func() {for range c {DumpStacks()}}()}func DumpStacks() {buf := make([]byte, 16384)buf = buf[:runtime.Stack(buf, true)]fmt.Printf("=== BEGIN goroutine stack dump ===\n%s\n=== END goroutine stack dump ===", buf)}

輸出結果很直觀,方便檢查。

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
=== BEGIN goroutine stack dump ===goroutine 36 [running]:main.DumpStacks()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:47 +0x77main.setupSigusr1Trap.func1(0xc420070060)/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:40 +0x73created by main.setupSigusr1Trap/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:42 +0xecgoroutine 1 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.m3()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:28 +0x30main.m2()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:24 +0x14main.m1()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:20 +0x14main.main()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:16 +0x3fgoroutine 34 [syscall]:os/signal.signal_recv(0xff280)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/sigqueue.go:116 +0x157os/signal.loop()/usr/local/Cellar/go/1.7.4/libexec/src/os/signal/signal_unix.go:22 +0x22created by os/signal.init.1/usr/local/Cellar/go/1.7.4/libexec/src/os/signal/signal_unix.go:28 +0x41goroutine 35 [select, locked to thread]:runtime.gopark(0xb5cc8, 0x0, 0xab3ef, 0x6, 0x18, 0x2)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/proc.go:259 +0x13aruntime.selectgoImpl(0xc42008d730, 0x0, 0x18)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/select.go:423 +0x11d9runtime.selectgo(0xc42008d730)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/select.go:238 +0x1cruntime.ensureSigM.func1()/usr/local/Cellar/go/1.7.4/libexec/src/runtime/signal1_unix.go:304 +0x2d1runtime.goexit()/usr/local/Cellar/go/1.7.4/libexec/src/runtime/asm_amd64.s:2086 +0x1goroutine 37 [sleep]:time.Sleep(0x34630b8a000)/usr/local/Cellar/go/1.7.4/libexec/src/runtime/time.go:59 +0xe1main.a()/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:32 +0x30created by main.main/Users/yuepan/go/src/github.com/smallnest/dump/d3.go:15 +0x3a=== END goroutine stack dump ===

當然,這段代碼也不是我原創的,這是docker程式碼程式庫中的一段代碼,很簡單,也很強大。

參考文檔

  1. http://stackoverflow.com/questions/19094099/how-to-dump-goroutine-stacktraces
  2. https://golang.org/pkg/runtime/
相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.