This is a creation in Article, where the information may have evolved or changed.
Objective
Recently implemented a log collection escalation program with Golang (internally called logger
Project), the online grayscale test process found a logger
very high CPU consumption (80%-100%). While this project was used online before, for consumption NSQ
tasks, CPU occupied has been at 1%, the most recent modification is simply the addition of disk queue-based producer consumer Services, the producer uses Go-gin to implement a httpserver, to receive data after writing to the disk queue; Goroutine Loop POST
data. and httpserver pressure is not small (less than the number of QPS), do not occupy such a high CPU, roughly review code after not found anomalies, the use pprof
and flame-graph
to analyze positioning problems.
Pprof
pprof
I understand that program profile (the meaning of procedural profiling), Golang provides two official package runtime/pprof,net/http/pprof to facilitate the collection of program run stack, goroutine, memory allocation and occupancy, IO and other information .prof
file, you can then use Go tool pprof to analyze the .prof
file. The effect of two packages is the same, just the difference in the way of use.
Runtime/pprof
If the program is a non-httpserver type, use this method, and embed the following code in the main function:
import "runtime/pprof"var cpuprofile = flag.String("cpuprofile", "", "write cpu profile `file`")var memprofile = flag.String("memprofile", "", "write memory profile to `file`")func main() { flag.Parse() if *cpuprofile != "" { f, err := os.Create(*cpuprofile) if err != nil { log.Fatal("could not create CPU profile: ", err) } if err := pprof.StartCPUProfile(f); err != nil { log.Fatal("could not start CPU profile: ", err) } defer pprof.StopCPUProfile() } // ... rest of the program ... if *memprofile != "" { f, err := os.Create(*memprofile) if err != nil { log.Fatal("could not create memory profile: ", err) } runtime.GC() // get up-to-date statistics if err := pprof.WriteHeapProfile(f); err != nil { log.Fatal("could not write memory profile: ", err) } f.Close() }}
Run the program
./logger -cpuprofile cpu.prof -memprofile mem.prof
You can get cpu.prof and mem.prof files, using Go tool pprof analysis.
go tool pprof logger cpu.profgo tool pprof logger mem.prof
Net/http/pprof
If the program is of type Httpserver, you only need to import the package:
import _ "net/http/pprof"
If Httpserver uses the package instead of go-gin
starting with the default http
package, you will need to manually add /debug/pprof
the corresponding Handler,github with the encapsulated template:
import "github.com/DeanThompson/ginpprof"...router := gin.Default()ginpprof.Wrap(router)...
After the import package is recompiled and run, access in the browser http://host:port/debug/pprof
can see the following information, here host
and port
is the program bound host
and port
, for example, my own logger program, access to the following address:
http://127.0.0.1:4500/debug/pprof/
/debug/pprof/profiles:0block62goroutine427heap0mutex12threadcreatefull goroutine stack dump
Click on the corresponding profile to view specific information, the data viewed through the browser does not directly reflect the performance of the program, go tool pprof
command-line tools provide a rich set of tools:
View heap Information
go tool pprof http://127.0.0.1:4500/debug/pprof/heap
View CPU sampling information for 30s
go tool pprof http://127.0.0.1:4500/debug/pprof/profile
See the official Net/http/pprof Library for additional features
Pprof CPU Analysis
After collecting the profile data, you can analyze the CPU hotspot code. Execute the following command:
go tool pprof http://127.0.0.1:4500/debug/pprof/profile
Will collect 30s profile data, then enter the terminal interactive mode, enter the top
instructions.
~ # Go Tool pprof http://127.0.0.1:4500/debug/pprof/profileFetching profile over HTTP from http://127.0.0.1:4500/debug/ Pprof/profilesaved Profile In/home/vagrant/pprof/pprof.logger.samples.cpu.012.pb.gzfile:loggertype:cputime:jan 2018 at 2:01pm (CST) duration:30s, total samples = 390ms (1.30%) Entering interactive mode (type ' help ' for commands, "O" for options) (pprof) topshowing nodes accounting for 360ms, 92.31% of 390ms totalshowing Top ten nodes out of At flat% sum% cum cum% 120ms 30.77% 30.77% 180ms 46.15% compress/flate. (*compressor). Findmatch/usr/local/go/src/compress/flate/deflate.go 100ms 25.64% 56.41% 310ms 79.49% COMPRESS/FL Ate. (*compressor). Deflate/usr/local/go/src/compress/flate/deflate.go 60ms 15.38% 71.79% 60ms 15.38% Compress/flat E.matchlen/usr/local/go/src/compress/flate/deflate.go 20ms 5.13% 76.92% 20ms 5.13% compress/flate. (*huffmanbitwriter). Indextokens/usr/local/go/src/compress/flate/hufFman_bit_writer.go 10ms 2.56% 79.49% 10ms 2.56% compress/flate. (*huffmanbitwriter). Writetokens/usr/local/go/src/compress/flate/huffman_bit_writer.go 10ms 2.56% 82.05% 10ms 2.56% hash/adler32.update/usr/local/go/src/hash/adler32/adler32.go 10ms 2.56% 84.62% 10ms 2.56% runtime.f Utex/usr/local/go/src/runtime/sys_linux_amd64.s 10ms 2.56% 87.18% 10ms 2.56% runtime.memclrnoheappointers/ Usr/local/go/src/runtime/memclr_amd64.s 10ms 2.56% 89.74% 10ms 2.56% runtime.pcvalue/usr/local/go/src/runti Me/symtab.go 10ms 2.56% 92.31% 10ms 2.56% runtime.runqput/usr/local/go/src/runtime/runtime2.go (pprof)
From the statistics can be top5
manipulated all data compression operations, the logger program itself turned on the compression level of 9 gzip compression, if you want to reduce the compression of CPU consumption, you can adjust the compression level.
Pprof MEM Analysis
Pprof also supports memory-related data analysis
--inuse_space
Analyze resident Memory
go tool pprof -alloc_space http://127.0.0.1:4500/debug/pprof/heap
~ # Go Tool pprof-alloc_space http://127.0.0.1:4500/debug/pprof/heapFetching profile over HTTP from http://127.0.0.1:450 0/debug/pprof/heapsaved Profile In/home/vagrant/pprof/pprof.logger.alloc_objects.alloc_space.inuse_objects.inuse _space.006.pb.gzfile:loggertype:alloc_spacetime:jan, 2018 at 2:21pm (CST) Entering interactive mode (type ' help ' for commands, "O" for options) (pprof) topshowing nodes accounting for 47204.90MB, 99.16% of 47606.01MB totaldropped nodes (Cum <= 238.03MB) Showing top nodes out of ten flat flat% sum% cum cum%28290.79mb 59.43% 59.43% 28290.79MB 59.43% bytes. Makeslice/usr/local/go/src/bytes/buffer.go 8706.78MB 18.29% 77.72% 10082.12MB 21.18% compress/flate. Newwriter/usr/local/go/src/compress/flate/deflate.go 8559.74MB 17.98% 95.70% 8559.74MB 17.98% github.com/nsqio/ Go-diskqueue. (*diskqueue). Readone/home/vagrant/go/src/github.com/nsqio/go-diskqueue/diskqueue.go 1343.78MB 2.82% 98.52% 1343.78MB 2.82% compress/flate. (*compressor). Init/usr/local/go/src/compress/flate/deflate.go 298.81MB 0.63% 99.15% 298.81MB 0.63% github.com/nsqio/go-nsq. Readresponse/home/vagrant/go/src/github.com/nsqio/go-nsq/protocol.go 2MB 0.0042% 99.15% 12097.28MB 25.41% Main. (*postpublisher). Publish/home/vagrant/logger/src/handler.go 1.50MB 0.0032% 99.15% 26358.53MB 55.37% IO/IOUTIL.READALL/USR/LOCAL/GO/SR C/io/ioutil/ioutil.go 1MB 0.0021% 99.16% 26378.74MB 55.41% github.com/gin-gonic/gin. Loggerwithwriter.func1/home/vagrant/go/src/github.com/gin-gonic/gin/logger.go 0.50MB 0.0011% 99.16% 26434.42MB 55.53% net/http. (*conn). serve/usr/local/go/src/net/http/server.go 0 0% 99.16% 26357.03MB 55.36% bytes. (*buffer). Readfrom/usr/local/go/src/bytes/buffer.go (PPROF)
--alloc_objects
Analyzing temporary memory
go tool pprof -inuse_space http://127.0.0.1:4500/debug/pprof/heap
~ # Go Tool pprof-inuse_space http://127.0.0.1:4500/debug/pprof/heapFetching profile over HTTP from http://127.0.0.1:450 0/debug/pprof/heapsaved Profile In/home/vagrant/pprof/pprof.logger.alloc_objects.alloc_space.inuse_objects.inuse _space.007.pb.gzfile:loggertype:inuse_spacetime:jan, 2018 at 2:24pm (CST) Entering interactive mode (type ' help ' for commands, "O" for options) (pprof) topshowing nodes accounting for 20441.23kB, 100% of 20441.23kB totalshowing top nodes Out of flat flat% sum% cum cum%18071.75kb 88.41% 88.41% 18071.75kB 88.41% bytes.makeslice/usr/local /go/src/bytes/buffer.go 815.27kB 3.99% 92.40% 815.27kB 3.99% github.com/nsqio/go-diskqueue. (*diskqueue). Readone/home/vagrant/go/src/github.com/nsqio/go-diskqueue/diskqueue.go 528.17kB 2.58% 94.98% 528.17kB 2.58% RegExp. (*bitstate). Reset/usr/local/go/src/regexp/backtrack.go 514kB 2.51% 97.50% 514kB 2.51% net/http.newbufiowriter Size/usr/local/go/src/bufio/bufio.go 512.05kB 2.50% 100% 512.05kB 2.50% net/http. (*persistconn). roundtrip/usr/local/go/src/net/http/transport.go 0 0% 100% 528.17kB 2.58% _/home/vagrant /logger/src/parser. Parserlogformarco/home/vagrant/logger/src/parser/parser.go 0 0% 100% 528.17kB 2.58% bytes. (*buffer). Readfrom/usr/local/go/src/bytes/buffer.go 0 0% 100% 17543.58kB 85.82% bytes. (*buffer). Write/usr/local/go/src/bytes/buffer.go 0 0% 100% 17543.58kB 85.82% bytes. (*buffer). grow/usr/local/go/src/bytes/buffer.go 0 0% 100% 528.17kB 2.58% github.com/gin-gonic/gin. (*context). Next/home/vagrant/go/src/github.com/gin-gonic/gin/context.go (PPROF)
By allocating top values through resident memory and temporary memory, you can view the current program's memory footprint and the code used by hotspot memory, combined with code to analyze whether there is a bug in the hotspot code and whether there is room for optimization.
Go-torch
Through the above go tool pprof
tools and the top command, we can locate the program's hotspot code, but lack of the overall perception of the program's operation, can we have a similar 火焰图
effect on the entire stack of statistical information to have a glance effect? Thanks to uber
The Open Source Tool Go-torch, we will be able to convert profile information into a flame diagram, and the installation and use process can be found in the project.
After installing the Go-torch, run
go-torch -u http://127.0.0.1:4500
Generating a CPU flame diagram
At one glance, the compression operation in the Publish function accounts for about 70% of the CPU.
The gin Httpserver only takes up about 2% of the CPU and is consistent with the results we use go tool pprof
for the top command analysis.
By default, Go-torch collects the CPU profile, where the Mem flame diagram is captured.
inuse_space
Flame diagram
go-torch -inuse_space http://127.0.0.1:4500/debug/pprof/heap --colors=mem
alloc_space
Flame diagram
go-torch -alloc_space http://127.0.0.1:4500/debug/pprof/heap --colors=mem
Logger 100% CPU Analysis
The use of the go tool pprof
flame diagram is described earlier and the logger 100% CPU problem is reproduced using the flame diagram.
First look at the phenomenon, with wrk pressure test logger
wrk -t1 -c100 -d30 --script=post.lua 'http://127.0.0.1:4500/marco/log'
View CPU Usage
Acquisition of 30s CPU profile Flame diagram
The red mark part of the startSink
function in the figure runtime.selectgo
consumes a lot of CPU, and the runtime.selectgo
above only runtime.sellock
and runtime.selunlock
two operations, that is, a large number of CPU consumed in select
operation, the flame graph is bald state, that is the bottleneck.
View startSink
Implementation
for {if Exit = = True {return} if Moveforward { Fakeread = Readchan} else {fakeread = nil} Select {Case read: = <-fakere ad:count++ buf. Write (read) case <-done:diskqueue.close () exit = True default://pass } if Count = = Globalconf.cntbatch | | Exit = = True {hostpoolresponse: = pool. Get () Addr: = Hostpoolresponse.host () Err: = handler. Publish (FMT. Sprintf ("%s%s", addr, Sinkconf.uri), buf. Bytes ()) Hostpoolresponse.mark (ERR) if err! = Nil {log.error ("%s", err. Error ()) Moveforward = False time. Sleep (1 * time. Second) Continue} else {Moveforward = true} buf. Reset () Count = 0}}
I would like to use Moveforward to control whether the Fakeread value, and if the fakeread is nil, the entire select will be blocked, so add the default
operation, let select become non-blocking, but because the content has not been read, count does not increase without triggering a sleep operation. The result is a non-blocking select that has a idling cycle, similar to an empty while loop, which consumes a lot of CPU.
Optimization
The use of other methods to implement this part of the logic, which is no longer posted here, the weight of sharing the process of discovering problems, the improved flame diagram is given in the previous.
Summarize
Golang applications typically have no runtime problems as long as they can be compiled, and when applications experience high CPU, high memory consumption, or long response time as HTTP service side, QPS does not wait, and cannot code review solve, can try to use pprof
and Flame-Graph
to analyze the positioning problem, It's miraculous. Of course, Golang program debugging and tuning there are many ways, such as direct integration go test
and through the use of benchmark
test cases to analyze hot code, using go pprof
analysis assembly code.
Tweets