This is a creation in Article, where the information may have evolved or changed.
- 1. Background
- 2. Slice
- 2.1 Internal structure
- 2.2 Pre-overwrite values
- 3. String
- 3.1 Redistribution
- 3.2 Conversion
- 4. Escape analysis
- 4.1 Improving performance
- 4.2 Fled to the heap
- 4.3 Escape assignment
- 4.4 Size Allocation
- 5. Version differences
- 6. Conclusion
1. Background
Last Thursday, the small partner sent a post to the go community under hej8875 's reply, as follows:
package mainimport"fmt"func main() {s := []byte(""appendappend(s, 'b')//fmt.Println(s1, "==========", s2)fmt.Println(string"=========="string(s2))}// 出现个让我理解不了的现象, 注释时候输出是 b ========== b// 取消注释输出是 [97] ========== [98] a ========== b
This reply is more interesting and confusing than the original sticker. The author tested it, and it did so, and then discussed it with the little friends. began to think it should be quite simple, after understanding, found to involve a lot of knowledge points, it is worth sharing the next process with you.
2. Slice
2.1 Internal structure
Let's leave this line of code in the comments //fmt.Println(s1, "==========", s2)
and say it later. When output b ========== b
, the expected results of A and B have not been met. We know that slice does not store real values internally, but rather a reference to an array fragment whose internal structure is:
typestruct { uintptr lenint capint}
Where data is a pointer to an array element, Len refers to the number of elements slice to reference in the array. The CAP is the number of elements remaining to be referenced in the array (from data point to start), minus Len, and how many elements can be added to the slice (array), which can occur if the data is copied out. Slice's:
make([]byte5)//
s = s[2:4] //会重新生成新的slice,并赋值给s。与底层数组的引用也发生了改变
2.2 Pre-overwrite values
Back to the question, you can infer that s := []byte("")
s in this line of code actually refers to an array of byte.
Its capacity is 32,length is 0:
s := []byte("")fmt.Println(caplen(s))//输出: 32 0
The key point is that the append in the code below is s1 := append(s, 'a')
not modified in the original slice, and of course there is no way to modify it, because in go all values are passed. When the S is passed into the APPEND function, a copy of the S1 has been copied, and then appended on the S1 a
, the S1 length is increased by 1, but the s length is still 0:
s := []byte("")fmt.Println(caplenappend(s, 'a')fmt.Println(caplen(s1))// 输出// 32 0// 32 1
Because s,s1 points to the same array, the append operation on S1 a
(the underlying array [0]=a) is also the operation of the array that s points to, but s itself does not change. This is also the append in go:
append(s,'a')
The Append function returns S1 and needs to be re-assigned to S. If you do not assign a value, the data recorded by S itself lags behind, and once again append it, it begins to operate from the lagged data. Although it looks append, it is actually the last time the value of append was covered.
So the answer to the question is: after append B, the last append a to cover, so will output b b.
Suppose that the underlying array is arr
, as commented:
go s := []byte("") s1 := append(s, 'a') // 等同于 arr[0] = 'a' s2 := append(s, 'b') // 等同于 arr[0] = 'b' fmt.Println(string(s1), "==========", string(s2)) // 只是把同一份数组打印出来了
3. String
3.1 Redistribution
Old wet, can you give a little more strength? Yes, let's go ahead and take a look at the question:
s := []byteappendappend(s, 'b') fmt.Println(string","string(s2))fmt.Println(caplen(s))
Guess what the output is?
The answer is: A, B and 0 0, in line with expectations.
In the above 2.2 section examples, the output is: 32, 0. It seems the crux of the matter here, the difference is that one is the default []byte{}
, the other is an empty string to go []byte("")
. Its length is 0, better understand, but why the capacity is 32 does not meet the expected output?
Because capacity is the amount of capacity that an array can add, it is not redistributed in cases where it can be satisfied. So capacity-length=32, is enough append a,b
. We use make to verify the following:
// append 内会重新分配,输出a,bmake([]byte00)// append 内不会重新分配,输出b,b,因为容量为1,足够appendmake([]byte01appendappend(s, 'b')fmt.Println(string","string(s2))
Redistribution refers to: Append will check the size of slice, if the capacity is not enough, will re-create a larger slice, and copy the original array. In make([]byte,0,0)
this case, the s capacity is certainly not enough, so the s1,s2 use of their own copied from the array, the results also naturally conform to the expected, a, B.
After testing the reallocation, the capacity becomes larger, printing S1:
make([]byte00append(s, 'a')fmt.Println(caplen(s1))// 输出 8,1。重新分配后扩大了
3.2 Conversion
Why is the slice capacity of the empty string turn 32? Instead of 0 or 8?
Had to sacrifice the killer, turn the source. Go official tools, you can find the compiled call assembly information, or in large tracts of source search is also very tired.
-gcflags
is to pass parameters to the go compiler, which -S -S
is to print assembly call information and data, -S
only to print the call information.
go'-S -S' main.go
Here is the output:
0x0000 00000 () TEXT "".main(SB), $264-0 0x003e 00062 () MOVQ AX, (SP) 0x0042 00066 () XORPS X0, X0 0x0045 00069 () MOVUPS X0, 8(SP) 0x004a 00074 () PCDATA $0, $0 0x004a 00074 () CALL runtime.stringtoslicebyte(SB) 0x004f 00079 () MOVQ 32(SP), AX b , b
Go uses the PLAN9 compilation syntax, although the whole is somewhat difficult to understand, but also can see the key points we need:
CALL runtime.stringtoslicebyte(SB)
Locate the source code to src\runtime\string.go
:
The stringtoslicebyte
source of capacity 32 can be seen from the function, see note:
const tmpstringbufsize = 32 type tmpbuf [tmpstringbufsize]byte func Stringtoslicebyte (buf *tmpbuf, S string ) []byte { var b []byte if buf! = nil && len (s) <= len (buf) {*buf = tmpbuf{} //tmpbuf default capacity is + B = Buf[:len (s)] //create a new slice with a capacity of 32 and a length of 0, assigned to B. } else {b = rawbyteslice (len (s))} copy (b, s) //S is an empty string, copy past is also length 0 return B}
Then why not go to the Else rawbyteslice
function?
funcint) (b []byte) { cap := roundupsize(uintptr(size)) p := mallocgc(capnilfalse) ifcapuintptr(size) { uintptrcap-uintptr(size)) } int(cap)} return}
If you go else, the capacity is not 32. If you go, it does not affect the conclusion (covering) that can be tested under:
s := []byte(strings.Repeat("c"33)) append(s, 'a') append(s, 'b') fmt.Println(string","string(s2)) // cccccccccccccccccccccccccccccccccb , cccccccccccccccccccccccccccccccccb
4. Escape analysis
Old wet, can you give a little more strength? When should I go else? Old wet You said the most days, the pit has not been filled, why add comments to meet the expected output a,b
? Plus comments Why even the capacity has changed?
s := []byte("")fmt.Println(caplenappendappend",", s2)fmt.Println(string","string(s2))//输出// 0 0// [97] ========== [98]// a , b
If you use escape analysis to explain, it is better to understand, first look at what is escape analysis.
4.1 Improving performance
If a function or subroutine has a local object and returns a pointer to that object, the pointer may be referenced anywhere else, so that the pointer succeeds in "escaping". Escape analysis is the way to analyze this type of pointer range, and the benefit is to improve performance:
- The biggest benefit should be to reduce the pressure on the GC, the object that does not escape is allocated on the stack, and when the function returns, the resource is reclaimed and no GC tag cleanup is required.
- Because after the escape analysis can determine which variables can be allocated on the stack, the stack is allocated faster than the heap, good performance
- synchronous elimination, if the method defined by the object has a synchronous lock, but at run time, but only one thread in the access, at this time the escape analysis of the machine code, will remove the synchronous lock run.
Go at compile time to run the escape analysis, to determine whether an object on the stack or put on the heap, not escape the object put on the stack, may escape the heap.
4.2 Fled to the heap
Uncomment: When the go compiler escapes the analysis, it detects fmt.Println
a reference to S, so it allocates an array under s on the decision heap. In the case of string-to-]byte, there is a default capacity of 32 allocated on the stack, but not on the allocation heap.
Execute with the following command, you can get the escape information, this command only compiles the program does not run, the above with the go Run-gcflags is to pass parameters to the compiler and run the program.
go tool compile -m main.go
After canceling the comment fmt.Println(s1, ",", s2)
([]byte) (""), it escapes to the heap:
shell main.go:23:13: s1 escapes to heap main.go:20:13: ([]byte)("") escapes to heap // 逃逸到堆上 main.go:23:18: "," escapes to heap main.go:23:18: s2 escapes to heap main.go:24:20: string(s1) escapes to heap main.go:24:20: string(s1) escapes to heap main.go:24:26: "," escapes to heap main.go:24:37: string(s2) escapes to heap main.go:24:37: string(s2) escapes to heap main.go:23:13: main ... argument does not escape main.go:24:13: main ... argument does not escape
Adding comments //fmt.Println(s1, ",", s2)
does not escape to the heap:
go tool compile -m main.gomain.goescapes to heapmain.goescapes to heapmain.go"," escapes to heapmain.goescapes to heapmain.goescapes to heapmain.go:20:13: main ([]byte)("")does not escape //不逃逸main.go:24:13: main ... argument does not escape
4.3 Escape assignment
Then continue to locate the calling stringtoslicebyte
place in the src\cmd\compile\internal\gc\walk.go
file. For ease of understanding, the following code summarizes:
const ( EscUnknown iota EscNone // 结果或参数不逃逸堆上. ) case OSTRARRAYBYTE: a := nodnil() //默认数组为空 if n.Esc == EscNone { // 在栈上为slice创建临时数组 t := types.NewArray(types.Types[TUINT8], tmpstringbufsize) nil) } n = mkcall("stringtoslicebyte", n.Type, init, a, conv(n.Left, types.Types[TSTRING]))
A 32-byte array is allocated in the case of No escape t
. The escape case is not allocated, the array is set to nil, so the capacity of S is 0. Then from the S on append a, B to s1,s2, which will inevitably occur replication, so does not occur before the overwrite value, also conforms to the expected result, a, B. It's stringtoslicebyte
clear to see it again.
funcstring) []byte { var b []byte ifnillenlen(buf) { *buf = tmpBuf{} b = buf[:len(s)] else { b = rawbyteslice(len(s)) } copy(b, s) return b}
4.4 Size Allocation
Do not escape the case by default 32. What is the allocation strategy in the escape situation?
s := []byte("a")fmt.Println(capappendappend(s, 'b')fmt.Print(s1, s2)
If it is an empty string, its output: 0. "A" string when output: 8.
The size depends on src\runtime\size.go
the Roundupsize function and the class_to_size variable.
These increase the size of the changes that are src\runtime\mksizeclasses.go
generated by the.
5. Version differences
Old wet, can you give a little more strength? The old wet you speak is all wrong, I run the result and you are opposite. Yes, you are right, the author is right, after all, we are using go to write a program, if the go to the bottom of the change, the result is not the same. In the research process, the author found that the source code of the other blog stringtoslicebyte
is:
func stringtoslicebyte(s String) (b Slice) { b.array = runtime·mallocgc(s.len0, FlagNoScan|FlagNoZero); b.len = s.len; b.cap = s.len; runtime·memmove(b.array, s.str, s.len);}
The above version of the source code, the result is also expected, because it will not be allocated by default 32-byte array.
Continue to the old version of the code, to 1.3. Version 2 is this:
func stringtoslicebyte(s String) (b Slice) { uintptrcap; cap = runtime·roundupsize(s.len); b.array = runtime·mallocgc(cap0, FlagNoScan|FlagNoZero); b.len = s.len; b.capcap; runtime·memmove(b.array, s.str, s.len); if(cap != b.len) runtime·memclr(b.array+b.lencap-b.len);}
Version 1.6.4:
funcstring) []byte { var b []byte ifnillenlen(buf) { b = buf[:len(s):len(s)] else { b = rawbyteslice(len(s)) } copy(b, s) return b}
More ancient:
struct __go_open_array__go_string_to_byte_array (String str){ uintptr cap; unsignedchar *data; struct __go_open_array ret; cap = runtime_roundupsize (str.len); data = (unsignedchar0, FlagNoScan | FlagNoZero); __builtin_memcpy (data, str.str, str.len); if (cap != (uintptr) str.len) 0, cap - (uintptr) str.len); ret.__values = (void *) data; ret.__count = str.len; ret.__capacity = str.len; return ret;}
The authors tested on version 1.6.4, and the resulting results were indeed reversed, and the comments were instead getting the expected results a, B. 1.10.2 is used in this article.
6. Conclusion
Old wet, can you give a little more strength? , and went on without a day.
Summary below:
- Output b,b when commenting. Because there is no escape, so the default 32 byte size of the array is assigned, 2 times append is assigned in the array [0], after the value of the previous value is overwritten, so is the b,b.
- When uncomment, output A, a, B. Because of the
fmt.Println
reference S, the escape analysis was found to be escaping and was an empty string, so an empty array was allocated. 2 times Append are the new slice after the operation of their respective reallocation, so the output is a, B.
Attention:
- In the source directory
gc
is Go compiler
the meaning, but not Garbage Collection
, gcflags
in the gc
same meaning.
- In addition, this kind of writing is meaningless, also very not recommended. Should be
[]byte("string")
treated as read-only to use, or it will be prone to difficult to troubleshoot the bug.
6.1 References
Original posts are: https://gocn.io/question/1852
https://gocn.io/article/355
https://go-review.googlesource.com/c/gofrontend/+/30827
Http://golang-examples.tumblr.com/post/86403044869/conversion-between-byte-and-string-dont-share