Memory Anatomy of the Go language mechanism (Language mechanics on Memory Profiling)

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed. # # Pre-order (Prelude) This series contains four articles that help you understand some of the grammatical structures in the Go language and the design principles behind them, including pointers, stacks, heaps, escape analysis, and value/pointer passing. This is the third article, which mainly introduces heap and escape analysis. (Translator Note: This article can be considered as the second version of the Advanced edition) The following is an index of the article series: 1. [The stack and pointer of the Go language mechanism] (https://studygolang.com/articles/12443) 2. [Escape analysis of Go language mechanism] (https://studygolang.com/articles/12444) 3. [Memory Anatomy of Go language mechanism] (https://studygolang.com/articles/12445) 4. [Go language mechanism data and grammar design philosophy] (https://studygolang.com/articles/12487) Watch this sample code video demo: [Gophercon Singapore-Escape Analysis] (https://engineers.sg/video/go-concurrency-live-gophercon-sg-2017--1746) # # Introduction (Introduction) in the previous blog post, The basis of the escape analysis is explained by an example of the value shared on the Goroutine stack. There are other scenarios that have not been introduced to cause the value to escape. To help you understand, I'll debug a program that allocates memory and use a very interesting method. # # program I want to understand the ' IO ' package, so I created a simple project. Given a sequence of characters, write a function that can find the string ' Elvis ' and replace it with the ' Elvis ' that begins with capital. We're talking about the King (Elvis, Elvis, rock star) whose name is always capitalized. This is a link to a solution: [Https://play.golang.org/p/n_SzF4Cer4] (HTTPS://PLAY.GOLANG.ORG/P/N_SZF4CER4) This is a link to a stress test: [https:// PLAY.GOLANG.ORG/P/TNXRXJVFLV] (Https://play.golang.org/p/TnXrxJVfLV) code list There are two different functions to solve this problem. This blog post will focus on the ' algone ' function because it uses the ' IO ' library.You can use the ' algtwo ' yourself to experience the difference in memory and CPU consumption. List # # 1 "input:abcelvisaelvisabcelviseelvisaelvisaabeeeelvise l v I saa BB e l v I saa elviselvielviselvielvielviselvi1e Lvielviselvisoutput:abcelvisaelvisabcelviseelvisaelvisaabeeeelvise l v I saa BB e l v I saa elviselvielviselvielvielvisel Vi1elvielviselvis ' This is the complete ' algone ' function. # # # 2 ' gofunc algone (data []byte, Find []byte, Repl []byte], output *bytes. Buffer) {//Use a bytes Buffer to provide a stream to process. Input: = bytes. Newbuffer (data)//The number of bytes we is looking for. Size: = Len (Find)//Declare The buffers we need to process the stream. BUF: = Make ([]byte, size] End: = size-1//Read in a initial number of bytes we need to get started. If n, err: = Io. Readfull (input, buf[:end]); Err! = Nil {output. Write (Buf[:n]) return} for {//Read in one byte from the input stream. If _, Err: = Io. Readfull (Input, buf[end:]); Err! = Nil {//Flush the reset of the bytes we have. Output. Write (Buf[:end]) return}//If We have a match, REplace the bytes. If Bytes.compare (buf, find) = = 0 {output. Write (REPL)//Read A new initial number of bytes. If n, err: = Io. Readfull (input, buf[:end]); Err! = Nil {output. Write (Buf[:n]) return} continue}//write the front byte since it has been compared. Output. WriteByte (buf[0])//Slice that front byte out. Copy (BUF, buf[1:])} "I want to know how well this function behaves and what kind of pressure it has on the heap allocation." For this purpose, we will carry out a stress test. # # Stress Test (Benchmarking) This is the stress test function I wrote, which calls the ' algone ' function internally to handle the data flow. # # # 3 ' Gofunc Benchmarkalgorithmone (b *testing. B) {var output bytes. Buffer in: = Assembleinputstream () Find: = []byte ("Elvis") Repl: = []byte ("Elvis") B.resettimer () for I: = 0; i < B.N; i++ {output. Reset () Algone (in, Find, Repl, &output)} ' has this stress test function, we can run ' go test ' and use '-bench ', '-benchtime ' and '-benchmem ' options. List # # 4 "$ go test-run none-bench algorithmone-benchtime 3s-benchmembenchmarkalgorithmone-8 2000000 2522 ns/op 117 B /op 2 Allocs/op "after running the stress test, we can see that the ' algone ' function assigns two values and allocates 117 bytes at a time. It's really great, but we also need to know which line of code is causing the assignment. For this purpose, we need to generate the analytical data for the stress test. # # Performance Analysis (Profiling) in order to generate the analysis data, we will run the stress test again, but this time in order to generate memory detection data, we open the '-memprofile ' switch. List # # 5 "$ go test-run none-bench algorithmone-benchtime 3s-benchmem-memprofile mem.outbenchmarkalgorithmone-8 2000 2570 ns/op 117 b/op 2 Allocs/op "Once the stress test is complete, the test tool will generate two new files. # # # 6 ' ~/code/go/src/.../memcpu$ ls-ltotal 9248-rw-r--r--1 Bill staff 209, 18:11 mem.out (NEW)-rwxr-xr-x 1 bil L 2847600 18:10 memcpu.test (NEW)-rw-r--r--1 bill staff 4761 could 18:01 stream.go-rw-r--r--1 bill staff 8 14:49 stream_test.go ' source code in ' memcpu ' directory, ' algone ' function in ' stream.go ' file, stress test function in ' stream_test.go ' file. The newly generated files are ' mem.out ' and ' memcpu.test '. ' Mem.out ' contains profiling data and ' memcpu.test ' files, as well as binary files that require access to symbols when we view profiling data. With analysis data and binary test files, we can run the ' pprof ' tool to learn data analysis. # # # 7 ' $ go tool pprof-alloc_space memcpu.test mem.outentering interactive mode (type "help" for Commands) (pprof) _ ' ' When analyzing memory data, in order to get the information we want easily, you will want to replace the default '-inuse_space ' option with the '-alloc_space ' option. This will show you where each allocation occurs, whether or not it is still in memory when you analyze the data. In ' (PPRof) ' Prompt, we use the ' list ' command to check the ' algone ' function. This command can use the regular expression as a parameter to find the function you want. # # # 8 "(pprof) List Algonetotal:335.03mbroutine ======================== .../memcpu.algone in code/go/src/.../ Memcpu/stream.go 335.03MB 335.03MB (flat, cum) 100% of total. . 78:. . 79://Algone is one-to-solve the problem ... 80:func algone (data []byte, Find []byte, Repl []byte], output *bytes. Buffer) {...:-------Bayi://Use a bytes Buffer to provide a stream to process. 318.53MB 318.53MB 83:input: = bytes. Newbuffer (data). . 84:. . Bytes://The number of looking for ... 86:size: = Len (Find). . 87:. . Declare the buffers we need to process the stream. 16.50MB 16.50MB 89:buf: = Make ([]byte, size). . 90:end: = size-1. . 91:. . The following://Read in an initial number of bytes we need to get started ... 93:if N, err: = Io. Readfull (input, buf[:end]); Err! = Nil | | N < End {.. 94:output. Write (Buf[:n]) (Pprof) _ "Based on this data analysis, we now know that ' input ', ' buf ' array is allocated in the heap. Because ' input ' is a pointer variable, the analysis data indicates that the ' input ' pointer changesThe ' bytes ' specified by the volume. The Buffer ' value is assigned. Let's focus on the ' input ' memory allocation and figuring out why it's being allocated. We can assume that it is assigned because it calls ' bytes. Newbuffer ' function was shared on the stack ' bytes. Buffer ' value. However, the value that exists in the ' Flat ' column (the first column of the pprof output) tells us that the value is allocated because the ' algone ' function share has caused it to escape. I know that the ' flat ' column represents the assignment in the function because the ' list ' command shows ' Aglone ' in the ' Benchmark ' function. # # # 9 "(pprof) List Benchmarktotal:335.03mbroutine ======================== .../memcpu. Benchmarkalgorithmone in Code/go/src/.../memcpu/stream_test.go 0 335.03MB (flat, cum) 100% of total. . 18:find: = []byte ("Elvis"). . 19:REPL: = []byte ("Elvis"). . 20:. . 21:b.resettimer (). . 22:. 335.03MB 23:for I: = 0; i < B.N; i++ {.. 24:output. Reset (). . 25:algone (in, Find, Repl, &output). . 26:}. . 27:}. . (pprof) _ ' because there is only one value in the ' ' column (the second column), this tells me that ' Benchmark ' is not assigned directly. All memory allocations occur in the loop of the function call. You can see that the allocation number of these two ' list ' calls is matched. We still don't know why ' bytes. The Buffer ' value is assigned. When ' Go build ' opens '-gcflags '-m-m ' will come in handy. Analysis data can only tell you which values escaped, but the compile command can tell you why. # # Compiler report (Compiler Reporting) Let's take a look at the compiler's decision about the escape analysis in the code. # # # of "Bashgo build-gcflags"-m-m "" This command produces a whole bunch of output. WeOnly need to include ' stream.go:83 ' in the search output, because ' Stream.go ' is the file name that contains the code and the 83rd line contains ' bytes. The value of Buffer '. We found 6 lines after the search. # # # list "./stream.go:83:inlining call to bytes. Newbuffer func ([]byte) *bytes. Buffer {return &bytes. Buffer literal}./stream.go:83: &bytes. Buffer literal escapes to Heap./stream.go:83:from ~r0 (Assign-pair) at./stream.go:83./stream.go:83:from Input (assigned ) at./stream.go:83./stream.go:83:from Input (interface-converted) at./stream.go:93./stream.go:83:from Input (Passed T o Call[argument escapes]) at./stream.go:93 "We searched for the first line found in ' stream.go:83 ' is very interesting. # # # list "./stream.go:83:inlining call to bytes. Newbuffer func ([]byte) *bytes. Buffer {return &bytes. Buffer literal} ' can be sure ' bytes. The Buffer ' value is not escaped because it is passed to the call stack. This is because there is no call to ' bytes. Newbuffer ', the function inline processing. So this is the code snippet that I wrote: # # List of "" "the Input: = bytes. Newbuffer (data) ' because the compiler chooses inline ' bytes. Newbuffer ' function call, the code I wrote was converted to: # # # list "Input: = &bytes. Buffer{buf:data} "This means that the ' algone ' function directly constructs ' bytes '. Buffer ' value. So, now the question is what caused the value to escape from the ' algone ' stack frame? The answer in our search results in addition5 lines. # # # list "./stream.go:83: &bytes. Buffer literal escapes to Heap./stream.go:83:from ~r0 (Assign-pair) at./stream.go:83./stream.go:83:from Input (assigned ) at./stream.go:83./stream.go:83:from Input (interface-converted) at./stream.go:93./stream.go:83:from Input (Passed T o Call[argument escapes]) at./stream.go:93 ' These lines tell us that the 93rd line in the code is causing the escape. The ' input ' variable is assigned a value to an interface variable. # # interface (Interfaces) I don't remember assigning a value to an interface variable in the code. However, if you see 93 lines, you can see exactly what happened. # # # # "'" "If n, err: = Io. Readfull (input, buf[:end]); Err! = Nil {94 output. Write (Buf[:n]): "IO". The Readfull ' call resulted in an interface assignment. If you look at the ' IO. Readfull ' function definition, you can see how an interface type receives the ' input ' value. # # # list "Gotype Reader interface {Read (P []byte) (n int, err error)}func Readfull (R Reader, buf []byte) (n int, err ER ROR) {return readatleast (R, buf, Len (buf))} ' pass ' bytes. Buffer ' address to the call stack, which is stored in the ' Reader ' interface variable, will cause a runaway. Now we know that the use of interface variables is cost-intensive: allocation and redirection. So, if there is no obvious reason to use the interface, you may not want to use the interface. Here is the principle of whether I choose to use interfaces in my code. When using interfaces:-The user API needs to provide implementation details. -The API's internal needs to maintain a variety of implementations. -The API part that can be changed has been identified and needs to beTo decouple. Conditions that do not use interfaces:-interfaces are used in order to use interfaces. -Promotion algorithm. -When the user can define their own interface. Now we can ask ourselves, this algorithm really needs ' IO. Readfull function? The answer is no, because ' bytes. Buffer ' type has a method for us to use. Using a method instead of calling a function prevents the memory from being redistributed. Let's modify the code, delete the ' io ' package, and use the ' Read ' function directly instead of the ' input ' variable. The modified code removes the call to the ' IO ' package, in order to preserve the same line number, I use the NULL flag instead of the ' IO ' package reference. This allows the imported rows of the library (not used) to be in the list. # # # List "goimport" ("bytes" "FMT" _ "io") func algone (data []byte, Find []byte, Repl []byte], output *bytes. Buffer) {//Use a bytes Buffer to provide a stream to process. Input: = bytes. Newbuffer (data)//The number of bytes we is looking for. Size: = Len (Find)//Declare The buffers we need to process the stream. BUF: = Make ([]byte, size] End: = size-1//Read in a initial number of bytes we need to get started. If n, err: = input. Read (Buf[:end]); Err! = Nil | | N < end {output. Write (Buf[:n]) return} for {//Read in one byte from the input stream. If _, Err: = input. Read (buf[end:]); Err! = Nil {//Flush the reset of the bytes we have. Output. Write (Buf[:end]) return}//If We have a match, replace the bytes. If Bytes.compare (buf, find) = = 0 {output. Write (REPL)//Read A new initial number of bytes. If n, err: = input. Read (Buf[:end]); Err! = Nil | | N < end {output. Write (Buf[:n]) return} continue}//write the front byte since it has been compared. Output. WriteByte (buf[0])//Slice that front byte out. Copy (BUF, buf[1:])}} "modified after we perform the stress test, we can see ' bytes. The allocation of Buffer ' has disappeared. # # # list "$ go test-run none-bench algorithmone-benchtime 3s-benchmem-memprofile mem.outbenchmarkalgorithmone-8 200 0000 1814 Ns/op 5 b/op 1 Allocs/op "We can see about 29% performance gains. The code is reduced from ' 2570 ns/op ' to ' 1814 ns/op '. Solve this problem, we can now focus on the ' buf ' slice array. If we use the test code to generate the analysis data again, we should be able to identify the cause of the remaining allocations. # # # List "$ go tool pprof-alloc_space memcpu.test mem.outentering interactive mode (type" help "for Commands) (Pprof) Li St Algonetotal:7.50mbroutine ======================== .../memcpu. Benchmarkalgorithmone in Code/go/src/.../memcpu/stream_test.go 11MB 11MB (flat, cum) 100% of total. . 84:. . Bytes://The number of looking for ... 86:sIze: = Len (Find). . 87:. . Declare the buffers we need to process the stream. 11MB 11MB 89:buf: = Make ([]byte, size). . 90:end: = size-1. . 91:. . The following://Read in an initial number of bytes we need to get started ... 93:if N, err: = input. Read (Buf[:end]); Err! = Nil | | N < End {.. 94:output. Write (Buf[:n]) "Only 89 lines are left, the distribution of the array slices. # # stack frame want to know the cause of the ' buf ' array slice allocation? Let's run ' go build ' again and use the '-gcflags '-m-m ' option and search for ' stream.go:89 '. # # # of "$ go build-gcflags"-m-m "./stream.go:89:make ([]byte, size) escapes to Heap./stream.go:89:from make ([]byte, Size) (too large for stack) at./stream.go:89 "The report shows that the array is too large for the stack. This information misled us. Not that the underlying array is too large, but that the compiler does not know the size of the array at compile time. The value is only assigned to the stack when the compiler compiles it to its size. This is because the stack frame size of each function is calculated at compile time. If the compiler does not know its size, it will only be allocated in the heap. To verify (our idea), we hardcoded the value to 5 and then run the stress test again. # # # List of "BUF: = Make ([]byte, 5)" This time we run the stress test, the assignment disappears. # # # list "$ go test-run none-bench algorithmone-benchtime 3s-benchmembenchmarkalgorithmone-8 3000000 1720 ns/op 0 b/ Op 0 allocs/op "If you look at the compiler report again, you will find that there is no need to escape processing. ### List of "$ go build-gcflags"-m-m "./stream.go:83:algone &bytes. Buffer literal does not escape./stream.go:89:algone makes ([]byte, 5) does not escape "Obviously we cannot determine the size of the slice, so we need to allocate it once in the algorithm. # # Allocation and performance (Allocation and performance) compare the performance of each upgrade we have during the refactoring process. # # # # "before any optimizationBenchmarkAlgorithmOne-8 2000000 2570 ns/op 117 b/op 2 allocs/opremoving the bytes. Buffer allocationBenchmarkAlgorithmOne-8 2000000 1814 ns/op 5 B/op 1 allocs/opremoving the backing array Allocationbenchma RkAlgorithmOne-8 3000000 1720 ns/op 0 b/op 0 allocs/op "Remove bytes. Buffer (re) memory allocation, we get about 29% performance improvement, delete all allocations, we can get about 33% performance improvement. Memory allocation is one of the factors that affect application performance. # # Conclusion (conclusion) Go has some magical tools that enable you to understand some of the decisions that the compiler makes about escaping analysis. Based on this information, you can refactor the code so that the value exists in the stack without needing to (be reassigned to) the heap. You don't want to get rid of all the memory (and then) allocations in all the software, but you want to minimize those allocations. This means that you should never use performance as the first priority when writing a program, because you don't want to keep guessing about performance (when writing programs). Writing the right code is your first priority. This means that the first thing we need to focus on is completeness, readability, and simplicity. Once you have a program that you can run, you need to determine if the program is fast enough. If the program is not fast enough, use the tools provided by the language to find and resolve performance issues.

Via:https://www.ardanlabs.com/blog/2017/06/language-mechanics-on-memory-profiling.html

Author: William Kennedy Translator: gogeof proofreading: polaris1119

This article by GCTT original compilation, go language Chinese network honor launches

This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove

1466 reads ∙1 likes
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.