Golang string vs. []byte

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Why does string and []byte type conversions require a price?
Why does the built-in function copy have a special case copy(dst []byte, src string) int ?
string and []byte, the underlying are arrays, but why []byte is more flexible than string, stitching performance is also higher (dynamic string splicing performance comparison)?
Read the source today to explore a bit.
All of the following views are personal humble opinion, with different suggestions or additions to the Welcome emial I Aboutme

What is a string?

What is a string? builtinexplanation of the standard library:

type stringstring is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.

In a nutshell, a string is a collection of 8-bit bytes, usually but not necessarily UTF-8 encoded text. The string can be empty, but it cannot be nil. And the value of the string cannot be changed.
Different language strings have different implementations, in the go source code, the src/runtime/string.go string is defined as follows:

type stringStruct struct {    str unsafe.Pointer    len int}

You can see that STR is actually a pointer to the first address of an array, and the other field is Len length. What is this array? When instantiating this stringstruct:

func gostringnocopy(str *byte) string {ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}s := *(*string)(unsafe.Pointer(&ss))return s}

Haha, is actually a byte array, and note that string is actually a struct.

What is []byte?

First, in Go, Byte is the alias of Uint8. The slice structure is defined in the Go source code src/runtime/slice.go :

type slice struct {array unsafe.Pointerlen   intcap   int}

The array is a pointer to the arrays, Len represents the length, and the CAP represents the capacity. In addition to the CAP, the other looks like a string structure.
But in fact they are really big differences.

Difference

The value of the string cannot be changed

As mentioned above, the value of a string cannot be changed, and this sentence is not complete, it should be said that the value of the string cannot be changed, but can be replaced. Or a string structure to explain, all the string in the bottom is such a struct stringStruct{str: str_point, len: str_len} , the string structure of the str pointer to a character constant address, the content of this address can not be changed, because it is read-only, But this pointer can point to a different address, let's compare the difference between string and []byte type re-assignment:

s := "A1" // 分配存储"A1"的内存空间,s结构体里的str指针指向这快内存s = "A2"  // 重新给"A2"的分配内存空间,s结构体里的str指针指向这快内存

In fact [the difference between]byte and string is that the contents of the array can be changed when the variable is changed.]

s := []byte{1} // 分配存储1数组的内存空间,s结构体的array指针指向这个数组。s = []byte{2}  // 将array的内容改为2

Because the pointer to a string is not subject to change, each time the string is changed, the memory is reassigned once, and the previously allocated space is reclaimed by the GC, which is the root cause of the inefficient string operation.

string and []byte's reciprocal conversion

Convert string to []byte, syntax []byte(string) source as follows:

func stringtoslicebyte(buf *tmpBuf, s string) []byte {var b []byteif buf != nil && len(s) <= len(buf) {*buf = tmpBuf{}b = buf[:len(s)]} else {b = rawbyteslice(len(s))}copy(b, s)return b}func rawstring(size int) (s string, b []byte) {p := mallocgc(uintptr(size), nil, false)stringStructOf(&s).str = pstringStructOf(&s).len = size*(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}return}

You can see that B is a new assignment, and then copy S to B, as to why the copy function can directly copy the string to []byte, which is because the go source implements a single slicestringcopy function to implement, can be seen in detail src/runtime/slice.go .

convert []byte to String, syntax string([]byte) source as follows:

func slicebytetostring(buf *tmpBuf, b []byte) string {l := len(b)if l == 0 {// Turns out to be a relatively common case.// Consider that you want to parse out data between parens in "foo()bar",// you find the indices and convert the subslice to string.return ""}if raceenabled && l > 0 {racereadrangepc(unsafe.Pointer(&b[0]),uintptr(l),getcallerpc(unsafe.Pointer(&buf)),funcPC(slicebytetostring))}if msanenabled && l > 0 {msanread(unsafe.Pointer(&b[0]), uintptr(l))}s, c := rawstringtmp(buf, l)copy(c, b)return s}func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) {if buf != nil && l <= len(buf) {b = buf[:l]s = slicebytetostringtmp(b)} else {s, b = rawstring(l)}return}

You can still see that s is newly assigned, and then copy B to S.
Just because the string and []byte each other will have a new memory allocation, the cost is not small, but readers must not misunderstand, for the current machine, these costs are not worth mentioning. But if you want to frequent string and []byte Mutual conversions (assuming only), and there is no new memory allocation, can there be a way? The answer is yes.

package string_slicebyte_testimport ("log""reflect""testing""unsafe")func stringtoslicebyte(s string) []byte {sh := (*reflect.StringHeader)(unsafe.Pointer(&s))bh := reflect.SliceHeader{Data: sh.Data,Len:  sh.Len,Cap:  sh.Len,}return *(*[]byte)(unsafe.Pointer(&bh))}func slicebytetostring(b []byte) string {bh := (*reflect.SliceHeader)(unsafe.Pointer(&b))sh := reflect.StringHeader{Data: bh.Data,Len:  bh.Len,}return *(*string)(unsafe.Pointer(&sh))}func TestStringSliceByte(t *testing.T) {s1 := "abc"b1 := []byte("def")copy(b1, s1)log.Println(s1, b1)s := "hello"b2 := stringtoslicebyte(s)log.Println(b2)    // b2[0] = byte(99) unexpected fault addressb3 := []byte("test")s3 := slicebytetostring(b3)log.Println(s3)}

Although the answer is yes, it is highly recommended not to use this method to convert the type, because if the string is converted to []byte] by Stringtoslicebyte, the original string memory area is read-only when the same memory is shared. A change will cause the entire process to drop, and the error is that runtime cannot recover.

How to choose?

Since string is a series of bytes, and []byte can also express a series of bytes, what should be done in the actual application?

    • String can be compared directly, and []byte cannot, so []byte cannot be the key value of the map.
    • Because you cannot modify a character in a string, you need to use []byte] when the granularity is small to manipulate one character.
    • The string value cannot be nil, so if you want to express extra meaning by returning nil, use []byte.
    • []byte slices are so flexible that they want to use the feature of the slice []byte.
    • It takes a lot of string processing to use []byte, which is much better.

Finally, the performance is bullying from the scene, need to choose according to the actual scene.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.