Golang Memory analysis/Dynamic tracking

Source: Internet
Author: User

21-day Boutique blockchain free learning, in-depth expert led the way, to help developers easily play the blockchain! >>>

Golang pprof

When your Golang program consumes more memory than you understand during the run, you need to figure out what code in the program is causing the memory consumption. At this time Golang compiled program for you is a black box, how to understand the memory use it? Fortunately Golang has built in some mechanisms to help us with our analysis and tracking.

At this point, usually we can use Golang's pprof to help us analyze the memory usage of the Golang process.

Pprof instances

We usually use the HTTP API to expose PPROF information for analysis and we can use the package net/http/pprof . The following is a simple example:

// pprof 的init函数会将pprof里的一些handler注册到http.DefaultServeMux上// 当不使用http.DefaultServeMux来提供http api时,可以查阅其init函数,自己注册handlerimport _ "net/http/pprof"go func() {    http.ListenAndServe("0.0.0.0:8080", nil)}()

At this point we can start the process, and then access http://localhost:8080/debug/pprof/ can see a simple page, the page is displayed: Note: All of the following data, including the go tool pprof collected data are dependent on the PPROF sample rate in the process, the default 512kb for one sampling, when we think the data is not detailed, The sample rate can be adjusted runtime.MemProfileRate , but the lower the sampling rate, the slower the process runs.

/debug/pprof/profiles:0         block136840    goroutine902       heap0         mutex40        threadcreatefull goroutine stack dump

It simply exposes several built-in Profile statistics. For example, there are 136,840 goroutine running, click the relevant link to see the details.

When we analyze memory-related issues, you can click on the heap item and enter http://127.0.0.1:8080/debug/pprof/heap?debug=1 to see the specific display:

Heap profile:3190:77516056 [54762:612664248] @ heap/10485761:29081600 [1:29081600] @ 0x89368e 0x894cd9 0x8a5a9d 0x8a9 b7c 0x8af578 0x8b4441 0x8b4c6d 0x8b8504 0x8b2bc3 0x45b1c1# 0x89368d github.com/syndtr/goleveldb/leveldb/memdb. (*DB). put+0x59d# 0X894CD8 xxxxx/storage/internal/memtable. (*memtable). set+0x88# 0x8a5a9c Xxxxx/storage. (*snapshotter). appendcommitlog+0x1cc# 0x8a9b7b Xxxxx/storage. (*store). update+0x26b# 0x8af577 Xxxxx/config. (*config). update+0xa7# 0x8b4440 xxxxx/naming. (*naming). update+0x120# 0x8b4c6c xxxxx/naming. (*naming). instancetimeout+0x27c# 0x8b8503 xxxxx/naming. (*naming). (xxxxx/naming.instancetimeout)-fm+0x63......# runtime. memstats# Alloc = 2463648064# Totalalloc = 31707239480# Sys = 4831318840# lookups = 2690464# Mallocs = 274619648# Frees = 262711312# HeapAlloc = 2463648064# Heapsys = 3877830656# Heapidle = 854990848# Heapinuse = 3022839808# HeapReleased = 0# H Eapobjects = 11908336# Stack = 655949824/655949824# MSpan = 63329432/72040448# Mcache = 38400/49152# Buckhashsys = 1706593# Gcsys = 170819584# Othersys = 52922583# NextGC = 3570699312# Pausens = [1052815 217503 208124 233034 1146462 456882 1098525 530706 551702 419372 768322 596273 387826 45580 7 563621 587849 416204 599143 572823 488681 701731 656358 2476770 12141392 5827253 3508261 1715582 1295487 908563 788435 7 18700 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]# numgc = 31# DEBUGGC = False

It shows more content, but the subject is divided into 2 parts: The first part is printed as runtime.MemProfile() the record obtained by the runtime.MemProfileRecord . The meaning is:

heap profile: 3190(inused objects): 77516056(inused bytes) [54762(alloc objects): 612664248(alloc bytes)] @ heap/1048576(2*MemProfileRate)1: 29081600 [1: 29081600] (前面4个数跟第一行的一样,此行以后是每次记录的,后面的地址是记录中的栈指针)@ 0x89368e 0x894cd9 0x8a5a9d 0x8a9b7c 0x8af578 0x8b4441 0x8b4c6d 0x8b8504 0x8b2bc3 0x45b1c1#    0x89368d    github.com/syndtr/goleveldb/leveldb/memdb.(*DB).Put+0x59d 栈信息

The second part is better understood, printed by runtime.ReadMemStats() reading the runtime.MemStats information. We can focus on

    • SysThe memory space that the process obtains from the system, the virtual address space.
    • HeapAllocThe amount of space used by the process heap memory allocation, usually the heap object that the user new comes out of, containing the non-GC-dropped.
    • HeapSysThe process obtains the heap memory from the system, because the Golang underlying uses the TCMALLOC mechanism, which caches part of the heap memory, the virtual address space.
    • PauseNsRecords the time of each GC pause (in nanoseconds), recording up to 256 up-to-date records.
    • NumGCRecords the number of times the GC has occurred.

Believe that the pprof do not know the user read the above content, it is difficult to obtain more useful information. So we need to cite more tools to help us interpret pprof content more simply.

Go tool

We can use go tool pprof -inuse_space http://127.0.0.1:8080/debug/pprof/heap commands to connect to the process to see some memory-related information in use, and we get a command line that can be interacted with.

We can look at the data top10 to see the 10 function portals that are using more objects. Typically used to detect any memory object references that do not meet the expectations.

  (pprof) TOP101355.47MB of 1436.26MB Total (94.38%) Dropped 371 nodes (cum <= 7.18MB) Showing top nodes out Of (cum >= 23.50MB) flat flat% sum% cum cum% 512.96MB 35.71% 35.71% 512.96MB 35.71% net/http.ne Wbufiowritersize 503.93MB 35.09% 70.80% 503.93MB 35.09% net/http.newbufioreader 113.04MB 7.87% 78.67% 113.04MB 7. 87% runtime.rawstringtmp 55.02MB 3.83% 82.50% 55.02MB 3.83% runtime.malg 45.01MB 3.13% 85.64% 45.01MB 3.13 % Xxxxx/storage. (*node). Clone 26.50MB 1.85% 87.48% 52.50MB 3.66% context. Withcancel 25.50MB 1.78% 89.26% 83.58MB 5.82% runtime.systemstack 25.01MB 1.74% 91.00% 58.51MB 4.07% net/h Ttp.readrequest 25MB 1.74% 92.74% 29.03MB 2.02% runtime.mapassign 23.50MB 1.64% 94.38% 23.50MB 1.64% NE T/http. (*server). Newconn  

Then we are using go tool pprof -alloc_space http://127.0.0.1:8080/debug/pprof/heap the command linker to see the correlation of the memory object allocations. Then enter top to see some of the function calls that have more cumulative allocation memory:

(pprof) top523.38GB of 650.90GB total (80.41%)Dropped 342 nodes (cum <= 3.25GB)Showing top 10 nodes out of 106 (cum >= 28.02GB)      flat  flat%   sum%        cum   cum%  147.59GB 22.68% 22.68%   147.59GB 22.68%  runtime.rawstringtmp  129.23GB 19.85% 42.53%   129.24GB 19.86%  runtime.mapassign   48.23GB  7.41% 49.94%    48.23GB  7.41%  bytes.makeSlice   46.25GB  7.11% 57.05%    71.06GB 10.92%  encoding/json.Unmarshal   31.41GB  4.83% 61.87%   113.86GB 17.49%  net/http.readRequest   30.55GB  4.69% 66.57%   171.20GB 26.30%  net/http.(*conn).readRequest   22.95GB  3.53% 70.09%    22.95GB  3.53%  net/url.parse   22.70GB  3.49% 73.58%    22.70GB  3.49%  runtime.stringtoslicebyte   22.70GB  3.49% 77.07%    22.70GB  3.49%  runtime.makemap   21.75GB  3.34% 80.41%    28.02GB  4.31%  context.WithCancel

It can be seen that string-[]byte to each other, distribution map, Bytes.makeslice, Encoding/json. Unmarshal, such as call accumulation allocations more memory. At this point we can review the code, how to reduce these related calls, or optimize the related code logic.

When we do not know what functions are caused by these calls, we can enter top -cum to find, that -cum is, to accumulate the data in the function call relationship, such as the B function called by a function, the memory allocation in the B function will accumulate to a above, so that the call chain can be easily found.

(pprof) TOP20-CUM322890.40MB of 666518.53MB Total (48.44%) Dropped 342 nodes (cum <= 3332.59MB) Showing top nodes out of 106 (Cum >= 122316.23MB) flat flat% sum% cum cum% 0 0% 0% 643525.16MB 96.55% Runt Ime.goexit 2184.63MB 0.33% 0.33% 620745.26MB 93.13% net/http. (*conn). Serve 0 0% 0.33% 435300.50MB 65.31% xxxxx/api/server. (*httpserver). Servehttp 5865.22MB 0.88% 1.21% 435300.50MB 65.31% xxxxx/api/server/router. (*httprouter).  Servehttp 0 0% 1.21% 433121.39MB 64.98% net/http.serverhandler.servehttp 0 0% 1.21% 430456.29MB 64.58% Xxxxx/api/server/filter. (*chain). Next 43.50MB 0.0065% 1.21% 429469.71MB 64.43% xxxxx/api/server/filter. TRANSURLTOV1 0 0% 1.21% 346440.39MB 51.98% xxxxx/api/server/filter. ROLE30X31283.56MB 4.69% 5.91% 175309.48MB 26.30% net/http. (*conn). readrequest 0 0% 5.91% 153589.85MB 23.04% github.com/julienschmidt/httprouter. (*router).  Servehttp 0   0% 5.91% 153589.85MB 23.04% github.com/julienschmidt/httprouter. (*router). SERVEHTTP-FM 0 0% 5.91% 153540.85MB 23.04% xxxxx/api/server/router. (*httprouter). Register.func1 2MB 0.0003% 5.91% 153117.78MB 22.97% xxxxx/api/server/filter. VALIDATE151134.52MB 22.68% 28.58% 151135.02MB 22.68% runtime.rawstringtmp 0 0% 28.58% 150714.90MB 22.61% xxx Xx/api/server/router/naming/v1. (*servicerouter).  (Git.intra.weibo.com/platform/vintage/api/server/router/naming/v1.service)-FM 0 0% 28.58% 150714.90MB 22.61% Xxxxx/api/server/router/naming/v1. (*servicerouter). Service 0 0% 28.58% 141200.76MB 21.18% net/http. REDIRECT132334.96MB 19.85% 48.44% 132342.95MB 19.86% runtime.mapassign 42MB 0.0063% 48.44% 125834.16MB 18.88% xxxxx /api/server/router/naming/v1.heartbeat 0 0% 48.44% 122316.23MB 18.35% xxxxxx/config. (*config). Lookup

As shown above, it is easy to find out which functions are called by these functions.

Depending on the calling relationship of the code, it is filter.TransURLTov1 called filter.Role30x , but the difference between them is cum% 12.45%, so we can tell that the filter.TransURLTov1 amount of memory allocated by the internal itself reaches 12.45% of the total memory allocation of the entire process, which is a worthwhile optimization.

Then we can enter web a command, which will pop up a picture of our browser .svg , which will draw these cumulative relationships into a topological map, provided to us. or directly execute go tool pprof -alloc_space -cum -svg http://127.0.0.1:8080/debug/pprof/heap > heap.svg to generate the heap.svg picture.

Let's take a fragment from a picture to analyze:

Each block is a function call stack for the pprof record, and the number on the arrow pointing to the block is the memory of the cumulative allocation of the stack of records, the amount of memory allocated for the other functions called by the function from the number indicated by the arrow on the block. The difference between them can be simply understood as this function, in addition to calling other functions, its own allocation. The numbers inside the block also reflect this, and the numbers are: (自身分配的内存 of 该函数累积分配的内存) .

--inuse/alloc_space--inuse/alloc_objectsDifference

Typically:

    • Use --inuse_space to analyze the occupancy of the program resident memory;
    • Use --alloc_objects to analyze the temporary allocation of memory, you can improve the speed of the program.

Go-torch

In addition to direct use go tool pprof , we can also use a more intuitive flame diagram. Therefore, we can use Go-torch directly to generate the flame diagram of the Golang program, which is also directly dependent on the tool pprof/go tool pprof . For the installation of the tool, see the introduction of the project. The A4DAA2B version of the software only supports profiling of the memory.

We can use

go-torch -alloc_space http://127.0.0.1:8080/debug/pprof/heap --colors=memgo-torch -inuse_space http://127.0.0.1:8080/debug/pprof/heap --colors=mem

Note: -alloc_space/-inuse_space parameters and -u/-b other parameters are in conflict, after use, -alloc_space/-inuse_space please append the pprof resources directly behind the parameters, instead of using -u/-b parameters to specify, which go-torch is related to the parameter resolution problem, after reading its source code can understand. Also note that the URL that parses the memory must be the end of the heap, because the default path is profile, which is used to analyze CPU-related issues.

With the above 2 commands, we can get alloc_space/inuse_space 2 flame diagrams of meaning, such as alloc_space.svg/inuse_space.svg. We can use the browser to observe these 2 graphs, which, like the section of a mountain, bottom is the call stack for each function, so the height of the mountain is positively correlated with the depth of the function call, and the width of the mountain is proportional to the amount of memory used/allocated. We just need to keep an eye out for the wide, flat peaks, which are usually where we need to optimize.

Testing

go testWe can use a similar approach when we need to profiling some of the test/benchmark. For example, we can use built-in go test parameters to generate PPROF data and then analyze it with go tool pprof / go-torch .

    1. Generate Pprof files for CPU, mem
      go test -bench=BenchmarkStorageXXX -cpuprofile cpu.out -memprofile mem.out
    2. A binary file and 2 pprof data files are generated, such as
      storage.test cpu.out mem.out
    3. Then using Go-torch to parse, the binary file is put in front
      #分析cpugo-torch storage.test cpu.out#分析内存go-torch --colors=mem -alloc_space storage.test mem.outgo-torch --colors=mem -inuse_space storage.test mem.out

Optimization recommendations

Debugging performance issues in Go programs offers some common tuning recommendations:

1 merging multiple small objects into one large object

2 Reduce unnecessary pointer indirect references, use copy references more

For example bytes.Buffer *bytes.Buffer , use a substitution, because when you use a pointer, 2 objects are allocated to complete the reference.

3 When a local variable escapes, it aggregates it.

This theory is the same as 1, the core is to reduce the allocation of object, and reduce the pressure of GC. For example, the following code

for k, v := range m {k, v := k, v   // copy for capturing by the goroutinego func() {// use k and v}()}

Can be modified to:

for k, v := range m {x := struct{ k, v string }{k, v}   // copy for capturing by the goroutinego func() {// use x.k and x.v}()}

When modified, the escaped object becomes X and the K,v2 object is reduced to 1 objects.

[]bytepre-allocation of 4

When we know []byte exactly how many bytes will be used, we can use an array to pre-allocate this memory. For example:

type X struct {    buf      []byte    bufArray [16]byte // Buf usually does not grow beyond 16 bytes.}func MakeX() *X {    x := &X{}    // Preinitialize buf with the backing array.    x.buf = x.bufArray[:0]    return x}

5 Use a small number of types whenever possible

When some of our const or count fields do not require too large a number of bytes, we can usually declare it as a int8 type.

6 Reducing unnecessary pointer references

When an object does not contain any pointers (note: Strings,slices,maps and Chans contain hidden pointers), the scanning of the GC has little effect. For example, the slice of 1GB byte actually contains only a limited number of objects and does not affect garbage collection time. Therefore, we can reduce the reference of pointers as much as possible.

7 use sync.Pool to cache commonly used objects

Attention

The version between go1.9 and go1.9.2 go tool pprof introduces a bug that causes the above memory Analysis command to fail. Here's a fix:

cd $GOROOT/src/cmd/vendor/github.com/googlerm pprofgit clone https://github.com/google/pprof.git #确保在版本`e82ee9addc1b6c8e1d667ed6de0194241e1e03b5`之后rm $GOROOT/pkg/darwin_amd64/cmd/vendor/github.com/google/pprofcd $GOROOT/src/cmd/pprofgo buildmv pprof $GOROOT/pkg/tool/darwin_amd64/pprof
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.