This is a creation in Article, where the information may have evolved or changed.
tags (space delimited): Go Memory Profiler Performance Tuning performance analysis
Note: The author of this article is Dmitry Vyukov, the original address debugging performance issues in Go programs
This is the Memory Profiler paragraph in the original text.
The Memory analyzer shows how the function allocates heap memory. You can collect it in a similar way as CPU profile: Using Go Test--memprofile, using net/http/pprof via Http://myserver:6060:/debug/pprof/heap, or by calling Runti Me/pprof. Writeheapprofile.
You can only display memory allocated at the time of profile collection (default, Pprof--inuse_space flag), or all allocations from program startup (pprof--alloc_space flag). The former is useful for profile collection of Net/http/pprof field applications, which is useful for profile collection at the end of a program (otherwise you will see an empty profile).
Note: The memory Analyzer is simple, that is, the information it collects is only a subset of the memory allocations. The probability sample object is proportional to its size, you can use the go test --memprofilerate
flag to change the sampling ratio, or set the variable when the program starts runtime.MemProfileRate
. A ratio of 1 will result in the collection of all assigned information. However, it can result in slow execution, with the default sample rate of 1 samples per 512kb of memory allocation.
You can also display the number of bytes allocated, or the number of objects ( --inuse/alloc_space
and --inuse/alloc_objects
flags) allocated. The parser is more inclined to large sample objects when analyzing. But it is more important to understand that large objects affect memory consumption and GC time, whereas a large number of small allocations affect the execution speed (also a certain amount of GC time), so two observations can be very useful.
The object can be either persistent or transient. If you have some large persistent object assignments at the beginning of the program, they will most likely be sampled by the parser (because they are large enough). Such objects affect memory consumption and GC time, but they do not affect the normal execution speed (no memory management operations occur on them). In other words, if you have a large number of objects with very short life cycles, they can almost be represented in profiles (if you use the default--inuse_space mode), but they will obviously affect execution speed. Because they are constantly being distributed and released. So, again, it is useful to observe two types of objects.
Therefore, usually if you want to reduce memory consumption, during normal program operation, you need to view the --inuse_space
profile collection. If you want to increase the speed of execution, review the --alloc_objects
profile collection, after an important run time or after the program finishes.
There are some flags that control the granularity of the report. --functions
make pprof
the report at the function level (default). --lines
make pprof
the report at the source line level. This is very useful if the hot function is in a different row. Here there are also --addresses
and --files
each corresponding exact instruction address and file level.
This is a very useful option for memory profiles-you can view it in a browser (providing this functionality requires you to imported net/http/pprof). If you open http://myserver:6060/debug/pprof/heap?debug=1, you must see a heap similar to:
Heap Profile:4:266528 [123:11284472] @ heap/10485761:262144 [4:376832] @ 0x28d9f 0x2a201 0x2a28a 0x2624d 0x26188 0x94 CA3 0x94a0b 0x17add6 0x17ae9f 0x1069d3 0xfe911 0xf0a3e 0xf0d22 0x21a70# 0x2a201 cnew+0xc1 runtime/malloc.goc:718# 0x2a28a runtime.cnewarray+0x3a runtime/malloc.goc:731# 0x2624d makeslice1+0x4d Runt ime/slice.c:57# 0x26188 runtime.makeslice+0x98 runtime/slice.c:38# 0X94CA3 bytes.makeslice+0x63 bytes/buffer.go:191# 0x94a0b bytes. (*buffer). READFROM+0XCB bytes/buffer.go:163# 0x17add6 io/ioutil.readall+0x156 io/ioutil/ioutil.go:32# 0x1 7ae9f Io/ioutil. readall+0x3f io/ioutil/ioutil.go:41# 0x1069d3 Godoc/vfs. readfile+0x133 godoc/vfs/vfs.go:44# 0xfe911 godoc.func 023+0x471 godoc/meta.go:80# 0xf0a3e Godoc. (*corpus). updatemetadata+0x9e godoc/meta.go:101# 0xf0d22 Godoc. (*corpus). refreshmetadataloop+0xgodoc/meta.go:1412:4096 [2:4096] @ 0x28d9f 0x29059 0x1d252 0x1d450 0x106993 0xf1225 0xe1489 0xfbcad 0x21a70# 0x 1d252 newdefer+0x112 runtime/panic.c:49# 0x1d450 runtime.deferproc+0x10 runtime/panic.c : 132# 0x106993 Godoc/vfs. Readfile+0xf3 godoc/vfs/vfs.go:43# 0xf1225 Godoc. (*corpus). parsefile+0x75 godoc/parser.go:20# 0xe1489 Godoc. (*treebuilder). Newdirtree+0x8e9 godoc/dirtrees.go:108# 0xfbcad godoc.func 002+0x15d godoc/dirtrees.go: 100
The number at the beginning of each entry ("1:262,144 [4:376,832]") represents the number of currently surviving objects, the amount of memory that the surviving object has occupied, the total number of allocations allocated, and the memory that has been occupied by all allocations.
Optimizations are typically application-specific, but here are some common recommendations.
- Objects are merged into larger objects. For example, use bytes. Buffer instead of *bytes. The Buffer structure (later you can call bytes by calling. Buffer.grow buffer Pre-allocated). This reduces the amount of memory allocated (faster), while reducing the pressure on the garbage collector (faster garbage collection).
Local variables escape the scope of their declarations and are promoted to heap allocation. The compiler does not usually prove that several variables have the same lifespan, so it allocates each of these variables separately. So you can use the above suggestions to deal with local variables, for example, put the following:
for k, v := range m { k, v := k, v // copy for capturing by the goroutine go func() { // use k and v }()}
To be replaced by:
for k, v := range m { x := struct{ k, v string }{k, v} // copy for capturing by the goroutine go func() { // use x.k and x.v }()}
This will change the two memory allocations to a memory allocation. However, this optimization can affect the readability of your code, so use it wisely.
A special case of allocation is the slice array pre-allocation. If you know the standard size of a slice, you can pre-allocate an array of support as follows:
type X struct { buf []byte bufArray [16]byte // Buf usually does not grow beyond 16 bytes.}func MakeX() *X { x := &X{} // Preinitialize buf with the backing array. x.buf = x.bufArray[:0] return x}
- If possible, use a smaller data type, for example, using int8 instead of Int.
- The object does not contain any pointers (note: strings,slices, maps, and Chans contain hidden pointers) and are not scanned by the garbage collector. For example, the slice of 1GB byte does not actually affect garbage collection time. So if you remove pointers from active objects that you already use, it will definitely affect garbage collection time. Some possibilities: use indices instead of pointers to split objects into two parts, some of which do not contain pointers.
- Use freelists to reuse instantaneous objects and allocate quantities. The standard package contains sync. The Pool type, several times between garbage collection, allows the same object to be reused. Still, be aware that any manual memory management scenario, incorrect use of sync. Pool may cause use-after-free (bugs used after release) bugs.
You can use the garbage collector trace (see below) to get some more insight into memory issues.