Go string to []byte trap

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.
    • 1. Background
    • 2. Slice
      • 2.1 Internal structure
      • 2.2 Pre-overwrite values
    • 3. String
      • 3.1 Redistribution
      • 3.2 Conversion
    • 4. Escape analysis
      • 4.1 Improving performance
      • 4.2 Fled to the heap
      • 4.3 Escape assignment
      • 4.4 Size Allocation
    • 5. Version differences
    • 6. Conclusion
      • 6.1 References

1. Background

Last Thursday, the small partner sent a post to the go community under hej8875 's reply, as follows:

package mainimport"fmt"func main() {s := []byte(""appendappend(s, 'b')//fmt.Println(s1, "==========", s2)fmt.Println(string"=========="string(s2))}// 出现个让我理解不了的现象, 注释时候输出是 b ========== b// 取消注释输出是 [97] ========== [98] a ========== b 

This reply is more interesting and confusing than the original sticker. The author tested it, and it did so, and then discussed it with the little friends. began to think it should be quite simple, after understanding, found to involve a lot of knowledge points, it is worth sharing the next process with you.

2. Slice

2.1 Internal structure

Let's leave this line of code in the comments //fmt.Println(s1, "==========", s2) and say it later. When output b ========== b , the expected results of A and B have not been met. We know that slice does not store real values internally, but rather a reference to an array fragment whose internal structure is:

typestruct {    uintptr    lenint    capint}

Where data is a pointer to an array element, Len refers to the number of elements slice to reference in the array. The CAP is the number of elements remaining to be referenced in the array (from data point to start), minus Len, and how many elements can be added to the slice (array), which can occur if the data is copied out. Slice's:

make([]byte5)// 

s = s[2:4]  //会重新生成新的slice,并赋值给s。与底层数组的引用也发生了改变

2.2 Pre-overwrite values

Back to the question, you can infer that s := []byte("") s in this line of code actually refers to an array of byte.

Its capacity is 32,length is 0:

s := []byte("")fmt.Println(caplen(s))//输出: 32 0

The key point is that the append in the code below is s1 := append(s, 'a') not modified in the original slice, and of course there is no way to modify it, because in go all values are passed. When the S is passed into the APPEND function, a copy of the S1 has been copied, and then appended on the S1 a , the S1 length is increased by 1, but the s length is still 0:

s := []byte("")fmt.Println(caplenappend(s, 'a')fmt.Println(caplen(s1))// 输出// 32 0// 32 1

Because s,s1 points to the same array, the append operation on S1 a (the underlying array [0]=a) is also the operation of the array that s points to, but s itself does not change. This is also the append in go:

append(s,'a')

The Append function returns S1 and needs to be re-assigned to S. If you do not assign a value, the data recorded by S itself lags behind, and once again append it, it begins to operate from the lagged data. Although it looks append, it is actually the last time the value of append was covered.

So the answer to the question is: after append B, the last append a to cover, so will output b b.

Suppose that the underlying array is arr , as commented:

go s := []byte("") s1 := append(s, 'a') // 等同于 arr[0] = 'a' s2 := append(s, 'b') // 等同于 arr[0] = 'b' fmt.Println(string(s1), "==========", string(s2)) // 只是把同一份数组打印出来了

3. String

3.1 Redistribution

Old wet, can you give a little more strength? Yes, let's go ahead and take a look at the question:

s := []byteappendappend(s, 'b') fmt.Println(string","string(s2))fmt.Println(caplen(s))

Guess what the output is?

The answer is: A, B and 0 0, in line with expectations.

In the above 2.2 section examples, the output is: 32, 0. It seems the crux of the matter here, the difference is that one is the default []byte{} , the other is an empty string to go []byte("") . Its length is 0, better understand, but why the capacity is 32 does not meet the expected output?

Because capacity is the amount of capacity that an array can add, it is not redistributed in cases where it can be satisfied. So capacity-length=32, is enough append a,b . We use make to verify the following:

// append 内会重新分配,输出a,bmake([]byte00)// append 内不会重新分配,输出b,b,因为容量为1,足够appendmake([]byte01appendappend(s, 'b')fmt.Println(string","string(s2))

Redistribution refers to: Append will check the size of slice, if the capacity is not enough, will re-create a larger slice, and copy the original array. In make([]byte,0,0) this case, the s capacity is certainly not enough, so the s1,s2 use of their own copied from the array, the results also naturally conform to the expected, a, B.

After testing the reallocation, the capacity becomes larger, printing S1:

make([]byte00append(s, 'a')fmt.Println(caplen(s1))// 输出 8,1。重新分配后扩大了

3.2 Conversion

Why is the slice capacity of the empty string turn 32? Instead of 0 or 8?

Had to sacrifice the killer, turn the source. Go official tools, you can find the compiled call assembly information, or in large tracts of source search is also very tired.

-gcflagsis to pass parameters to the go compiler, which -S -S is to print assembly call information and data, -S only to print the call information.

go'-S -S' main.go

Here is the output:

    0x0000 00000 ()    TEXT    "".main(SB), $264-0    0x003e 00062 ()   MOVQ    AX, (SP)    0x0042 00066 ()   XORPS   X0, X0    0x0045 00069 ()   MOVUPS  X0, 8(SP)    0x004a 00074 ()   PCDATA  $0, $0    0x004a 00074 ()   CALL    runtime.stringtoslicebyte(SB)    0x004f 00079 ()   MOVQ    32(SP), AX    b , b

Go uses the PLAN9 compilation syntax, although the whole is somewhat difficult to understand, but also can see the key points we need:

CALL    runtime.stringtoslicebyte(SB)

Locate the source code to src\runtime\string.go :

The stringtoslicebyte source of capacity 32 can be seen from the function, see note:

const  tmpstringbufsize = 32  type  tmpbuf [tmpstringbufsize]byte  func  Stringtoslicebyte (buf *tmpbuf, S string ) []byte  { var  b []byte  if  buf! = nil  && len   (s) <= len  (buf) {*buf = tmpbuf{} //tmpbuf default capacity is +  B = Buf[:len  (s)] //create a new slice with a capacity of 32 and a length of 0, assigned to B. } else  {b = rawbyteslice (len  (s))} copy  (b, s) //S is an empty string, copy past is also length 0  return  B}  

Then why not go to the Else rawbyteslice function?

funcint) (b []byte) {    cap := roundupsize(uintptr(size))    p := mallocgc(capnilfalse)    ifcapuintptr(size) {        uintptrcap-uintptr(size))    }    int(cap)}    return}

If you go else, the capacity is not 32. If you go, it does not affect the conclusion (covering) that can be tested under:

    s := []byte(strings.Repeat("c"33))    append(s, 'a')    append(s, 'b')    fmt.Println(string","string(s2))    // cccccccccccccccccccccccccccccccccb , cccccccccccccccccccccccccccccccccb

4. Escape analysis

Old wet, can you give a little more strength? When should I go else? Old wet You said the most days, the pit has not been filled, why add comments to meet the expected output a,b ? Plus comments Why even the capacity has changed?

s := []byte("")fmt.Println(caplenappendappend",", s2)fmt.Println(string","string(s2))//输出// 0 0// [97] ========== [98]// a , b

If you use escape analysis to explain, it is better to understand, first look at what is escape analysis.

4.1 Improving performance

If a function or subroutine has a local object and returns a pointer to that object, the pointer may be referenced anywhere else, so that the pointer succeeds in "escaping". Escape analysis is the way to analyze this type of pointer range, and the benefit is to improve performance:

    • The biggest benefit should be to reduce the pressure on the GC, the object that does not escape is allocated on the stack, and when the function returns, the resource is reclaimed and no GC tag cleanup is required.
    • Because after the escape analysis can determine which variables can be allocated on the stack, the stack is allocated faster than the heap, good performance
    • synchronous elimination, if the method defined by the object has a synchronous lock, but at run time, but only one thread in the access, at this time the escape analysis of the machine code, will remove the synchronous lock run.

Go at compile time to run the escape analysis, to determine whether an object on the stack or put on the heap, not escape the object put on the stack, may escape the heap.

4.2 Fled to the heap

Uncomment: When the go compiler escapes the analysis, it detects fmt.Println a reference to S, so it allocates an array under s on the decision heap. In the case of string-to-]byte, there is a default capacity of 32 allocated on the stack, but not on the allocation heap.

Execute with the following command, you can get the escape information, this command only compiles the program does not run, the above with the go Run-gcflags is to pass parameters to the compiler and run the program.

go tool compile -m main.go

After canceling the comment fmt.Println(s1, ",", s2) ([]byte) (""), it escapes to the heap:

shell main.go:23:13: s1 escapes to heap main.go:20:13: ([]byte)("") escapes to heap // 逃逸到堆上 main.go:23:18: "," escapes to heap main.go:23:18: s2 escapes to heap main.go:24:20: string(s1) escapes to heap main.go:24:20: string(s1) escapes to heap main.go:24:26: "," escapes to heap main.go:24:37: string(s2) escapes to heap main.go:24:37: string(s2) escapes to heap main.go:23:13: main ... argument does not escape main.go:24:13: main ... argument does not escape

Adding comments //fmt.Println(s1, ",", s2) does not escape to the heap:

go tool compile -m main.gomain.goescapes to heapmain.goescapes to heapmain.go"," escapes to heapmain.goescapes to heapmain.goescapes to heapmain.go:20:13: main ([]byte)("")does not escape  //不逃逸main.go:24:13: main ... argument does not escape

4.3 Escape assignment

Then continue to locate the calling stringtoslicebyte place in the src\cmd\compile\internal\gc\walk.go file. For ease of understanding, the following code summarizes:

const (    EscUnknown        iota    EscNone           // 结果或参数不逃逸堆上. )  case OSTRARRAYBYTE:        a := nodnil()   //默认数组为空        if n.Esc == EscNone {            // 在栈上为slice创建临时数组            t := types.NewArray(types.Types[TUINT8], tmpstringbufsize)            nil)        }        n = mkcall("stringtoslicebyte", n.Type, init, a, conv(n.Left, types.Types[TSTRING]))

A 32-byte array is allocated in the case of No escape t . The escape case is not allocated, the array is set to nil, so the capacity of S is 0. Then from the S on append a, B to s1,s2, which will inevitably occur replication, so does not occur before the overwrite value, also conforms to the expected result, a, B. It's stringtoslicebyte clear to see it again.

funcstring) []byte {    var b []byte    ifnillenlen(buf) {         *buf = tmpBuf{}        b = buf[:len(s)]    else {        b = rawbyteslice(len(s))    }    copy(b, s)    return b}

4.4 Size Allocation

Do not escape the case by default 32. What is the allocation strategy in the escape situation?

s := []byte("a")fmt.Println(capappendappend(s, 'b')fmt.Print(s1, s2)

If it is an empty string, its output: 0. "A" string when output: 8.

The size depends on src\runtime\size.go the Roundupsize function and the class_to_size variable.

These increase the size of the changes that are src\runtime\mksizeclasses.go generated by the.

5. Version differences

Old wet, can you give a little more strength? The old wet you speak is all wrong, I run the result and you are opposite. Yes, you are right, the author is right, after all, we are using go to write a program, if the go to the bottom of the change, the result is not the same. In the research process, the author found that the source code of the other blog stringtoslicebyte is:

func stringtoslicebyte(s String) (b Slice) {    b.array = runtime·mallocgc(s.len0, FlagNoScan|FlagNoZero);    b.len = s.len;    b.cap = s.len;    runtime·memmove(b.array, s.str, s.len);}

The above version of the source code, the result is also expected, because it will not be allocated by default 32-byte array.

Continue to the old version of the code, to 1.3. Version 2 is this:

func stringtoslicebyte(s String) (b Slice) {    uintptrcap;    cap = runtime·roundupsize(s.len);    b.array = runtime·mallocgc(cap0, FlagNoScan|FlagNoZero);    b.len = s.len;    b.capcap;    runtime·memmove(b.array, s.str, s.len);    if(cap != b.len)        runtime·memclr(b.array+b.lencap-b.len);}

Version 1.6.4:

funcstring) []byte {    var b []byte    ifnillenlen(buf) {        b = buf[:len(s):len(s)]    else {        b = rawbyteslice(len(s))    }    copy(b, s)    return b}

More ancient:

struct __go_open_array__go_string_to_byte_array (String str){  uintptr cap;  unsignedchar *data;  struct __go_open_array ret;  cap = runtime_roundupsize (str.len);  data = (unsignedchar0, FlagNoScan | FlagNoZero);  __builtin_memcpy (data, str.str, str.len);  if (cap != (uintptr) str.len)    0, cap - (uintptr) str.len);  ret.__values = (void *) data;  ret.__count = str.len;  ret.__capacity = str.len;  return ret;}

The authors tested on version 1.6.4, and the resulting results were indeed reversed, and the comments were instead getting the expected results a, B. 1.10.2 is used in this article.

6. Conclusion

Old wet, can you give a little more strength? , and went on without a day.

Summary below:

    1. Output b,b when commenting. Because there is no escape, so the default 32 byte size of the array is assigned, 2 times append is assigned in the array [0], after the value of the previous value is overwritten, so is the b,b.
    2. When uncomment, output A, a, B. Because of the fmt.Println reference S, the escape analysis was found to be escaping and was an empty string, so an empty array was allocated. 2 times Append are the new slice after the operation of their respective reallocation, so the output is a, B.

Attention:

    1. In the source directory gc is Go compiler the meaning, but not Garbage Collection , gcflags in the gc same meaning.
    2. In addition, this kind of writing is meaningless, also very not recommended. Should be []byte("string") treated as read-only to use, or it will be prone to difficult to troubleshoot the bug.

6.1 References

Original posts are: https://gocn.io/question/1852

https://gocn.io/article/355

https://go-review.googlesource.com/c/gofrontend/+/30827

Http://golang-examples.tumblr.com/post/86403044869/conversion-between-byte-and-string-dont-share

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.