Debug Golang Apps with Pprof and Flame-graph

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Objective

Recently implemented a log collection escalation program with Golang (internally called logger Project), the online grayscale test process found a logger very high CPU consumption (80%-100%). While this project was used online before, for consumption NSQ tasks, CPU occupied has been at 1%, the most recent modification is simply the addition of disk queue-based producer consumer Services, the producer uses Go-gin to implement a httpserver, to receive data after writing to the disk queue; Goroutine Loop POST data. and httpserver pressure is not small (less than the number of QPS), do not occupy such a high CPU, roughly review code after not found anomalies, the use pprof and flame-graph to analyze positioning problems.

Pprof

pprofI understand that program profile (the meaning of procedural profiling), Golang provides two official package runtime/pprof,net/http/pprof to facilitate the collection of program run stack, goroutine, memory allocation and occupancy, IO and other information .proffile, you can then use Go tool pprof to analyze the .prof file. The effect of two packages is the same, just the difference in the way of use.

Runtime/pprof

If the program is a non-httpserver type, use this method, and embed the following code in the main function:

import "runtime/pprof"var cpuprofile = flag.String("cpuprofile", "", "write cpu profile `file`")var memprofile = flag.String("memprofile", "", "write memory profile to `file`")func main() {    flag.Parse()    if *cpuprofile != "" {        f, err := os.Create(*cpuprofile)        if err != nil {            log.Fatal("could not create CPU profile: ", err)        }        if err := pprof.StartCPUProfile(f); err != nil {            log.Fatal("could not start CPU profile: ", err)        }        defer pprof.StopCPUProfile()    }    // ... rest of the program ...    if *memprofile != "" {        f, err := os.Create(*memprofile)        if err != nil {            log.Fatal("could not create memory profile: ", err)        }        runtime.GC() // get up-to-date statistics        if err := pprof.WriteHeapProfile(f); err != nil {            log.Fatal("could not write memory profile: ", err)        }        f.Close()    }}

Run the program

./logger -cpuprofile cpu.prof -memprofile mem.prof

You can get cpu.prof and mem.prof files, using Go tool pprof analysis.

go tool pprof logger cpu.profgo tool pprof logger mem.prof

Net/http/pprof

If the program is of type Httpserver, you only need to import the package:

import _ "net/http/pprof"

If Httpserver uses the package instead of go-gin starting with the default http package, you will need to manually add /debug/pprof the corresponding Handler,github with the encapsulated template:

import "github.com/DeanThompson/ginpprof"...router := gin.Default()ginpprof.Wrap(router)...

After the import package is recompiled and run, access in the browser http://host:port/debug/pprof can see the following information, here host and port is the program bound host and port , for example, my own logger program, access to the following address:

http://127.0.0.1:4500/debug/pprof/

/debug/pprof/profiles:0block62goroutine427heap0mutex12threadcreatefull goroutine stack dump

Click on the corresponding profile to view specific information, the data viewed through the browser does not directly reflect the performance of the program, go tool pprof command-line tools provide a rich set of tools:

View heap Information

go tool pprof http://127.0.0.1:4500/debug/pprof/heap

View CPU sampling information for 30s

go tool pprof http://127.0.0.1:4500/debug/pprof/profile

See the official Net/http/pprof Library for additional features

Pprof CPU Analysis

After collecting the profile data, you can analyze the CPU hotspot code. Execute the following command:

go tool pprof http://127.0.0.1:4500/debug/pprof/profile

Will collect 30s profile data, then enter the terminal interactive mode, enter the top instructions.

~ # Go Tool pprof http://127.0.0.1:4500/debug/pprof/profileFetching profile over HTTP from http://127.0.0.1:4500/debug/ Pprof/profilesaved Profile In/home/vagrant/pprof/pprof.logger.samples.cpu.012.pb.gzfile:loggertype:cputime:jan 2018 at 2:01pm (CST) duration:30s, total samples = 390ms (1.30%) Entering interactive mode (type ' help ' for commands, "O" for options) (pprof) topshowing nodes accounting for 360ms, 92.31% of 390ms totalshowing Top ten nodes out of At flat% sum% cum cum% 120ms 30.77% 30.77% 180ms 46.15% compress/flate. (*compressor). Findmatch/usr/local/go/src/compress/flate/deflate.go 100ms 25.64% 56.41% 310ms 79.49% COMPRESS/FL Ate. (*compressor). Deflate/usr/local/go/src/compress/flate/deflate.go 60ms 15.38% 71.79% 60ms 15.38% Compress/flat E.matchlen/usr/local/go/src/compress/flate/deflate.go 20ms 5.13% 76.92% 20ms 5.13% compress/flate. (*huffmanbitwriter). Indextokens/usr/local/go/src/compress/flate/hufFman_bit_writer.go 10ms 2.56% 79.49% 10ms 2.56% compress/flate.  (*huffmanbitwriter). Writetokens/usr/local/go/src/compress/flate/huffman_bit_writer.go 10ms 2.56% 82.05% 10ms 2.56% hash/adler32.update/usr/local/go/src/hash/adler32/adler32.go 10ms 2.56% 84.62% 10ms 2.56% runtime.f Utex/usr/local/go/src/runtime/sys_linux_amd64.s 10ms 2.56% 87.18% 10ms 2.56% runtime.memclrnoheappointers/ Usr/local/go/src/runtime/memclr_amd64.s 10ms 2.56% 89.74% 10ms 2.56% runtime.pcvalue/usr/local/go/src/runti Me/symtab.go 10ms 2.56% 92.31% 10ms 2.56% runtime.runqput/usr/local/go/src/runtime/runtime2.go (pprof)

From the statistics can be top5 manipulated all data compression operations, the logger program itself turned on the compression level of 9 gzip compression, if you want to reduce the compression of CPU consumption, you can adjust the compression level.

Pprof MEM Analysis

Pprof also supports memory-related data analysis

--inuse_spaceAnalyze resident Memory

go tool pprof -alloc_space http://127.0.0.1:4500/debug/pprof/heap
~ # Go Tool pprof-alloc_space http://127.0.0.1:4500/debug/pprof/heapFetching profile over HTTP from http://127.0.0.1:450 0/debug/pprof/heapsaved Profile In/home/vagrant/pprof/pprof.logger.alloc_objects.alloc_space.inuse_objects.inuse _space.006.pb.gzfile:loggertype:alloc_spacetime:jan, 2018 at 2:21pm (CST) Entering interactive mode (type ' help ' for commands, "O" for options) (pprof) topshowing nodes accounting for 47204.90MB, 99.16% of 47606.01MB totaldropped nodes (Cum <= 238.03MB) Showing top nodes out of ten flat flat% sum% cum cum%28290.79mb 59.43% 59.43% 28290.79MB 59.43% bytes. Makeslice/usr/local/go/src/bytes/buffer.go 8706.78MB 18.29% 77.72% 10082.12MB 21.18% compress/flate. Newwriter/usr/local/go/src/compress/flate/deflate.go 8559.74MB 17.98% 95.70% 8559.74MB 17.98% github.com/nsqio/ Go-diskqueue.  (*diskqueue). Readone/home/vagrant/go/src/github.com/nsqio/go-diskqueue/diskqueue.go 1343.78MB 2.82% 98.52% 1343.78MB 2.82% compress/flate. (*compressor). Init/usr/local/go/src/compress/flate/deflate.go 298.81MB 0.63% 99.15% 298.81MB 0.63% github.com/nsqio/go-nsq. Readresponse/home/vagrant/go/src/github.com/nsqio/go-nsq/protocol.go 2MB 0.0042% 99.15% 12097.28MB 25.41% Main. (*postpublisher). Publish/home/vagrant/logger/src/handler.go 1.50MB 0.0032% 99.15% 26358.53MB 55.37% IO/IOUTIL.READALL/USR/LOCAL/GO/SR C/io/ioutil/ioutil.go 1MB 0.0021% 99.16% 26378.74MB 55.41% github.com/gin-gonic/gin. Loggerwithwriter.func1/home/vagrant/go/src/github.com/gin-gonic/gin/logger.go 0.50MB 0.0011% 99.16% 26434.42MB 55.53% net/http. (*conn). serve/usr/local/go/src/net/http/server.go 0 0% 99.16% 26357.03MB 55.36% bytes. (*buffer). Readfrom/usr/local/go/src/bytes/buffer.go (PPROF)

--alloc_objectsAnalyzing temporary memory

 go tool pprof -inuse_space http://127.0.0.1:4500/debug/pprof/heap
~ # Go Tool pprof-inuse_space http://127.0.0.1:4500/debug/pprof/heapFetching profile over HTTP from http://127.0.0.1:450 0/debug/pprof/heapsaved Profile In/home/vagrant/pprof/pprof.logger.alloc_objects.alloc_space.inuse_objects.inuse _space.007.pb.gzfile:loggertype:inuse_spacetime:jan, 2018 at 2:24pm (CST) Entering interactive mode (type ' help ' for  commands, "O" for options) (pprof) topshowing nodes accounting for 20441.23kB, 100% of 20441.23kB totalshowing top nodes Out of flat flat% sum% cum cum%18071.75kb 88.41% 88.41% 18071.75kB 88.41% bytes.makeslice/usr/local /go/src/bytes/buffer.go 815.27kB 3.99% 92.40% 815.27kB 3.99% github.com/nsqio/go-diskqueue.  (*diskqueue). Readone/home/vagrant/go/src/github.com/nsqio/go-diskqueue/diskqueue.go 528.17kB 2.58% 94.98% 528.17kB 2.58% RegExp. (*bitstate). Reset/usr/local/go/src/regexp/backtrack.go 514kB 2.51% 97.50% 514kB 2.51% net/http.newbufiowriter Size/usr/local/go/src/bufio/bufio.go 512.05kB 2.50% 100% 512.05kB 2.50% net/http. (*persistconn). roundtrip/usr/local/go/src/net/http/transport.go 0 0% 100% 528.17kB 2.58% _/home/vagrant /logger/src/parser. Parserlogformarco/home/vagrant/logger/src/parser/parser.go 0 0% 100% 528.17kB 2.58% bytes. (*buffer). Readfrom/usr/local/go/src/bytes/buffer.go 0 0% 100% 17543.58kB 85.82% bytes. (*buffer). Write/usr/local/go/src/bytes/buffer.go 0 0% 100% 17543.58kB 85.82% bytes. (*buffer). grow/usr/local/go/src/bytes/buffer.go 0 0% 100% 528.17kB 2.58% github.com/gin-gonic/gin. (*context). Next/home/vagrant/go/src/github.com/gin-gonic/gin/context.go (PPROF)

By allocating top values through resident memory and temporary memory, you can view the current program's memory footprint and the code used by hotspot memory, combined with code to analyze whether there is a bug in the hotspot code and whether there is room for optimization.

Go-torch

Through the above go tool pprof tools and the top command, we can locate the program's hotspot code, but lack of the overall perception of the program's operation, can we have a similar 火焰图 effect on the entire stack of statistical information to have a glance effect? Thanks to uber The Open Source Tool Go-torch, we will be able to convert profile information into a flame diagram, and the installation and use process can be found in the project.

After installing the Go-torch, run

go-torch -u http://127.0.0.1:4500

Generating a CPU flame diagram

At one glance, the compression operation in the Publish function accounts for about 70% of the CPU.

The gin Httpserver only takes up about 2% of the CPU and is consistent with the results we use go tool pprof for the top command analysis.

By default, Go-torch collects the CPU profile, where the Mem flame diagram is captured.

inuse_spaceFlame diagram

go-torch -inuse_space http://127.0.0.1:4500/debug/pprof/heap --colors=mem

alloc_spaceFlame diagram

go-torch -alloc_space http://127.0.0.1:4500/debug/pprof/heap --colors=mem

Logger 100% CPU Analysis

The use of the go tool pprof flame diagram is described earlier and the logger 100% CPU problem is reproduced using the flame diagram.

First look at the phenomenon, with wrk pressure test logger

wrk -t1 -c100 -d30 --script=post.lua 'http://127.0.0.1:4500/marco/log'

View CPU Usage

Acquisition of 30s CPU profile Flame diagram

The red mark part of the startSink function in the figure runtime.selectgo consumes a lot of CPU, and the runtime.selectgo above only runtime.sellock and runtime.selunlock two operations, that is, a large number of CPU consumed in select operation, the flame graph is bald state, that is the bottleneck.

View startSink Implementation

  for {if Exit = = True {return} if Moveforward { Fakeread = Readchan} else {fakeread = nil} Select {Case read: = <-fakere ad:count++ buf.        Write (read) case <-done:diskqueue.close () exit = True default://pass } if Count = = Globalconf.cntbatch | | Exit = = True {hostpoolresponse: = pool. Get () Addr: = Hostpoolresponse.host () Err: = handler. Publish (FMT. Sprintf ("%s%s", addr, Sinkconf.uri), buf. Bytes ()) Hostpoolresponse.mark (ERR) if err! = Nil {log.error ("%s", err. Error ()) Moveforward = False time. Sleep (1 * time. Second) Continue} else {Moveforward = true} buf. Reset () Count = 0}}  

I would like to use Moveforward to control whether the Fakeread value, and if the fakeread is nil, the entire select will be blocked, so add the default operation, let select become non-blocking, but because the content has not been read, count does not increase without triggering a sleep operation. The result is a non-blocking select that has a idling cycle, similar to an empty while loop, which consumes a lot of CPU.

Optimization

The use of other methods to implement this part of the logic, which is no longer posted here, the weight of sharing the process of discovering problems, the improved flame diagram is given in the previous.

Summarize

Golang applications typically have no runtime problems as long as they can be compiled, and when applications experience high CPU, high memory consumption, or long response time as HTTP service side, QPS does not wait, and cannot code review solve, can try to use pprof and Flame-Graph to analyze the positioning problem, It's miraculous. Of course, Golang program debugging and tuning there are many ways, such as direct integration go test and through the use of benchmark test cases to analyze hot code, using go pprof analysis assembly code.

Tweets
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.