This is a creation in Article, where the information may have evolved or changed. # # Golang string vs. []byte] Why do string and []byte type conversions require a price? Why does the built-in function copy have a special case ' copy (DST []byte, src string) int '? string and []byte, the underlying are arrays, but why []byte is more flexible than string, the stitching performance is also higher ([dynamic string splicing performance comparison] (Https://sheepbao.github.io/post/golang_string_ connect_performance/))? Read the source today to explore a bit. All of the following views are personal humble opinion, with different suggestions or additions to the Welcome emial I [Aboutme] (https://sheepbao.github.io/about/) # # # What is a string? What is a string? Interpretation of the standard library ' builtin ': ' ' type stringstring is the set of all strings of 8-bit bytes, conventionally and not necessarily represen Ting utf-8-encoded text. A string may be empty and not nil. Values of string type are immutable. "Simply put, a string is a collection of 8-bit bytes, usually but not necessarily UTF-8 encoded text. The string can be empty, but it cannot be nil. And the value of the string cannot be changed. Different language strings have different implementations, in Go's source code ' Src/runtime/string.go ', string is defined as follows: ' ' Gotype stringstruct struct {str unsafe. Pointer Len int} "can see that STR is actually a pointer to the first address of an array, and the other field is Len length. What is this array? When instantiating this stringstruct: "Gofunc gostringnocopy (str *byte) string {ss: = Stringstruct{str:unsafe. Pointer (str), Len:findnull (str)}s: = * (*string) (unsafe. Pointer (&amP;SS) return s} ' Haha, is actually a byte array, and note that string is actually a struct. # # # What is []byte?] First, in Go, Byte is the alias of Uint8. And the slice structure in Go's source code ' src/runtime/slice.go ' definition: ' gotype slice struct {array unsafe. Pointerlen intcap int} ' array is a pointer to an array, Len represents the length, and the CAP represents capacity. In addition to the CAP, the other looks like a string structure. But in fact they are really big differences. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # The string value cannot be changed, but the value of the string cannot be changed. Or the structure of a string to explain it, all the string at the bottom is such a struct ' Stringstruct{str:str_point, Len:str_len} ', the string structure of the str pointer to a character constant address , the contents of this address can not be changed, because it is read-only, but this pointer can point to a different address, let us compare the string, []byte type re-assignment difference: "' Gos: =" A1 "//Allocate storage" A1 "memory space, The STR pointer in the s struct points to this fast memory S = "A2"//re-allocating memory space to "A2", and the STR pointer in the s struct points to this fast memory "' actually [the difference between]byte and string is that the contents of the array can be changed when the variable is changed. "' Gos: = []int{1}//allocates memory space for storing 1 arrays, and an array pointer to the s struct points to this array. s = []int{2}//Change the contents of the array to 2 ' because the pointer to the string is not changed, so every time the string is changed, the memory is reassigned once, and the previously allocated space is reclaimed by GC, which is the root cause of the inefficient string operation. # # # # # # # and []byte] convert string to []byte, Syntax ' []byte (String) ' source code as follows: ' ' Gofunc stringtoslicebyte (buf *tmpbuf, S string) [] byte {var b []byteif buf! = Nil && len (s) <= len (buf) {*bUF = Tmpbuf{}b = Buf[:len (s)]} else {b = Rawbyteslice (len (s))}copy (b, s) return b}func rawstring (size int) (s string, b []b Yte) {p: = MALLOCGC (uintptr (size), nil, false) Stringstructof (&s). str = Pstringstructof (&s). Len = size* (*slice) ( Unsafe. Pointer (&b)) = slice{p, size, Size}return} "" Can see that B is newly allocated, and then copy S to B, as to why the copy function can directly copy the string to []byte, That is because the go source code to implement a ' slicestringcopy ' function to achieve, in particular, can see ' src/runtime/slice.go '. convert []byte to String, syntax ' string ([]byte) ' source code as follows: ' ' Gofunc slicebytetostring (buf *tmpbuf, b []byte] string {l: = Len (b) if L = = 0 {/ /Turns out to be a relatively common case.//consider so you want to parse out data between parens in "foo () Bar",//You Find the indices and convert the Subslice to String.return ""}if raceenabled && L > 0 {racereadrangepc (unsafe. Pointer (&b[0]), UIntPtr (L), getcallerpc (unsafe. Pointer (&BUF)), FUNCPC (slicebytetostring))}if msanenabled && L > 0 {msanread (unsafe. Pointer (&b[0]), UIntPtr (L))}s, C: = Rawstringtmp (buf, L) copy (c, b) rEturn s}func rawstringtmp (buf *tmpbuf, L int) (s string, b []byte) {if buf! = Nil && l <= Len (buf) {b = buf[:l] s = slicebytetostringtmp (b)} else {s, b = rawstring (l)}return} "" can still see that s is newly allocated, and then copy B to S. Just because the string and []byte each other will have a new memory allocation, the cost is not small, but readers must not misunderstand, for the current machine, these costs are not worth mentioning. But if you want to frequent string and []byte Mutual conversions (assuming only), and there is no new memory allocation, can there be a way? The answer is yes. "' Gopackage string_slicebyte_testimport (" Log "" reflect "" testing "" unsafe ") Func Stringtoslicebyte (S string) []byte { SH: = (*reflect. Stringheader) (unsafe. Pointer (&s)) BH: = reflect. Sliceheader{data:sh. Data,len:sh. Len,cap:sh. Len,}return * (*[]byte) (unsafe. Pointer (&BH))}func slicebytetostring (b []byte) string {bh: = (*reflect. Sliceheader) (unsafe. Pointer (&b)) sh: = reflect. Stringheader{data:bh. Data,len:bh. Len,}return * (*string) (unsafe. Pointer (&sh))}func teststringslicebyte (t *testing. T) {s1: = "ABC" B1: = []byte ("def") copy (B1, S1) log. Println (S1, B1) s: = "Hello" b2: = Stringtoslicebyte (s) log. PRINTLN (B2)//b2[0] = byte unexpected fault addressb3 : = []byte ("Test") S3: = slicebytetostring (b3) log. Println (S3)} "The answer is yes, but it is highly recommended not to use this method to convert the type, because if the string is converted to []byte] by Stringtoslicebyte, the original string memory area is read-only when the same memory is shared. A change will cause the entire process to drop, and the error is that runtime cannot recover. # # # How to choose? Since string is a series of bytes, and []byte can also express a series of bytes, what should be done in the actual application? * string can be compared directly, and []byte cannot, so []byte cannot be the key value of the map. * Because you cannot modify a character in a string, you need to use []byte] when the granularity is small to manipulate one character. * String value cannot be nil, so if you want to express extra meaning by returning nil, use []byte. * []byte slices are so flexible that they want to use the feature of the slice []byte. * Use []byte] when a lot of string processing is required, the performance is much better. Finally, the performance is bullying from the scene, need to choose according to the actual scene. 232 reads
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.