In-depth analysis of Slice implementation in Go

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Slices are a basic data structure in Go that can be used to manage collections of data. The design idea of slices is the concept of dynamic arrays, for developers to make it easier for a data structure to automatically increase and decrease. But the slices themselves are not dynamic data or array pointers. Reslice, append, and copy are common operations for slicing. At the same time, slices have excellent features that can be indexed and iterative.

I. Slices and arrays

About slices and arrays how to choose? Let's discuss the subject in good detail.

In go, with the C array variable implicitly used as a pointer, the go array is a value type, and the assignment and function pass operations copy the entire array data.

func main() {      arrayA := [2]int{100, 200}    var arrayB [2]int    arrayB = arrayA    fmt.Printf("arrayA : %p , %v\n", &arrayA, arrayA)    fmt.Printf("arrayB : %p , %v\n", &arrayB, arrayB)    testArray(arrayA)}func testArray(x [2]int) {      fmt.Printf("func Array : %p , %v\n", &x, x)}

Printing results:

arrayA : 0xc4200bebf0 , [100 200]  arrayB : 0xc4200bec00 , [100 200]  func Array : 0xc4200bec30 , [100 200]

As you can see, the three memory addresses are different, which verifies that the array assignments and function parameters in Go are all values copied. So what's the problem?

Each time an array is used to pass an argument, it is copied again. If the array size is 1 million, it takes about 800W bytes, or 8MB of memory, on a 64-bit machine. This consumes a lot of memory. So it was thought that the function passed the pointer to the array.

func main() {      arrayA := [2]int{100, 200}    testArrayPoint(&arrayA)   // 1.传数组指针    arrayB := arrayA[:]    testArrayPoint(&arrayB)   // 2.传切片    fmt.Printf("arrayA : %p , %v\n", &arrayA, arrayA)}func testArrayPoint(x *[]int) {      fmt.Printf("func Array : %p , %v\n", x, *x)    (*x)[1] += 100}

Printing results:

func Array : 0xc4200b0140 , [100 200]  func Array : 0xc4200b0180 , [100 300]  arrayA : 0xc4200b0140 , [100 400]

This also proves that the array pointers do reach the effect we want. Now it's an array of 1 billion, and you just need to allocate a 8-byte memory on the stack to the pointer. This allows for more efficient use of memory and better performance than before.

However, the pointer will have a disadvantage, from the printing results can be seen, the first and third row of the pointer address are the same, in case the original array of pointers to change, then the pointer inside the function will follow the change.

The advantages of slicing are also shown. By using the array parameters, we can achieve the purpose of saving memory, and can achieve the problem of reasonably handling the shared memory. Printing Results The second line is a slice, and the pointer to the slice is different from the original array.

From this we can draw the conclusion that:

Passing the first large array to the function consumes a lot of memory, and using a slicing method can avoid the problem. Slices are reference passes, so they do not need to use extra memory and are more efficient than using arrays.

However, there are still counter-examples.

package mainimport "testing"func array() [1024]int {      var x [1024]int    for i := 0; i < len(x); i++ {        x[i] = i    }    return x}func slice() []int {      x := make([]int, 1024)    for i := 0; i < len(x); i++ {        x[i] = i    }    return x}func BenchmarkArray(b *testing.B) {      for i := 0; i < b.N; i++ {        array()    }}func BenchmarkSlice(b *testing.B) {      for i := 0; i < b.N; i++ {        slice()    }}

We do a performance test and disable inline and optimize to observe the memory allocations on the tile's heap.

  go test -bench . -benchmem -gcflags "-N -l"

The output comparison is "surprising":

BenchmarkArray-4          500000              3637 ns/op               0 B/op          0 alloc s/op  BenchmarkSlice-4          300000              4055 ns/op            8192 B/op          1 alloc s/op

To explain the above results, when testing an Array, the 4 cores are used, the number of cycles is 500000, the average execution time is 3637 NS, the total allocated memory on each execution heap is 0, and the number of allocations is 0.

And the result of the slice is "poor" point, the same is used is 4 cores, the number of cycles is 300000, the average execution time is 4055 NS, but each time, the total allocated memory on the heap is 8192, the allocation number is also 1.

In contrast, it is not always appropriate to use slices in place of arrays, since the underlying array of slices may allocate memory on the heap, and the consumption of the copy on the stack by the fractional group is not necessarily more expensive than make.

Two. Data structure of slices

The slice itself is not a dynamic array or an array pointer. Its internally implemented data structures refer to the underlying array by pointers, and set related properties to limit data read and write operations within the specified region. The slice itself is a read-only object that works like an encapsulation of an array pointer .

A slice (slice) is a reference to a contiguous fragment of an array, so a slice is a reference type (and therefore more similar to an array type in C + +, or a list type in Python). This fragment can be an entire array, or a subset of some items identified by the start and end indexes. It is important to note that the items that are identified by the terminating index are not included within the slice. The slice provides a dynamic window with a pointer to an array.

The slice index of a given item may be smaller than the index of the same element of the associated array. Unlike arrays, the length of a slice can be modified at run time, with a minimum of 0 to the length of the associated array: The slice is a variable-length array.

The Slice data structure is defined as follows:

type slice struct {      array unsafe.Pointer    len   int    cap   int}

The structure of a slice consists of 3 parts, Pointer is a pointer to an array, Len represents the length of the current slice, and the cap is the capacity of the current slice. The cap is always greater than or equal to Len.

If you want to get a memory address from slice, you can do this:

s := make([]byte, 200)  ptr := unsafe.Pointer(&s[0])

What if the reverse? Constructs a slice from the memory address of Go.

var ptr unsafe.Pointer  var s1 = struct {      addr uintptr    len int    cap int}{ptr, length, length}s := *(*[]byte)(unsafe.Pointer(&s1))

Constructs a virtual structure body, slice the data structure to spell out.

There is, of course, a more straightforward approach, where there is a corresponding data structure sliceheader in the Go reflection, and we can use it to construct a slice

var o []byte  sliceHeader := (*reflect.SliceHeader)((unsafe.Pointer(&o)))  sliceHeader.Cap = length  sliceHeader.Len = length  sliceHeader.Data = uintptr(ptr)

Three. Create slices

The Make function allows you to dynamically specify the length of an array at run time, bypassing the restriction that the array type must use a compile-time constant.

There are two forms of creating slices, making create slices, and empty slices.

1. Make and slice literals

func makeslice(et *_type, len, cap int) slice {      // 根据切片的数据类型,获取切片的最大容量    maxElements := maxSliceCap(et.size)    // 比较切片的长度,长度值域应该在[0,maxElements]之间    if len < 0 || uintptr(len) > maxElements {        panic(errorString("makeslice: len out of range"))    }    // 比较切片的容量,容量值域应该在[len,maxElements]之间    if cap < len || uintptr(cap) > maxElements {        panic(errorString("makeslice: cap out of range"))    }    // 根据切片的容量申请内存    p := mallocgc(et.size*uintptr(cap), et, true)    // 返回申请好内存的切片的首地址    return slice{p, len, cap}}

There is also a version of Int64:

func makeslice64(et *_type, len64, cap64 int64) slice {      len := int(len64)    if int64(len) != len64 {        panic(errorString("makeslice: len out of range"))    }    cap := int(cap64)    if int64(cap) != cap64 {        panic(errorString("makeslice: cap out of range"))    }    return makeslice(et, len, cap)}

The implementation of the principle is the same as above, but more to convert the int64 into an int this step.

is a slice of len = 4, Cap = 6, created with the Make function. The memory space requested a memory size of 6 int types. Since len = 4, the latter 2 are temporarily inaccessible, but the capacity is still there. Each variable in the array is 0.

In addition to the Make function, you can create slices outside of a literal.

Here is a slice of len = 6,cap = 6 created with the literal, when the values of each element in the array are initialized. It should be noted that [] do not write the capacity of the array, because if the number is written after the array, rather than slicing.

There is also a simple literal method for creating slices. Such as. A slice of len = 3,cap = 3 was created for Slice A. Starting from the second element of the original array (0 is the first bit), cut until the fourth bit (excluding the fifth bit). Similarly, Slice B creates a slice of Len = 2,cap = 4.

2. Nil and empty slices

Nil slices and empty slices are also commonly used.

var slice []int

Nil slices are used in many standard libraries and built-in functions, and when describing a nonexistent slice, you need to use nil slices. For example, when an exception occurs, the returned slice is the nil slice. The nil slice pointer points to nil.

An empty slice is typically used to represent an empty collection. For example, a database query, a result is not found, then you can return an empty slice.

silce := make( []int , 0 )  slice := []int{ }

The difference between an empty slice and a nil slice is that the empty slice points to an address that is not nil and points to a memory address, but it does not allocate any memory space, that is, the underlying element contains 0 elements.

The last point to note is. Whether you use nil slices or empty slices, the effect of calling built-in functions Append,len and caps is the same.

Four. Slicing expansion

When a slice is full, it needs to be enlarged. How to expand, what is the strategy?

Func growslice (et *_type, old slice, cap int) Slice {if raceenabled {callerpc: = Getcallerpc (unsafe.  Pointer (&et)) Racereadrangepc (Old.array, UIntPtr (old.len*int), Et.size, Callerpc (FUNCPC))} if msanenabled {msanread (Old.array, UIntPtr (Old.len*int (et.size)))} if Et.size = = 0 {//If the new capacity is larger than the original        Capacity is also small, which means to shrink the capacity, then you can directly report panic. If cap < Old.cap {Panic (errorstring ("Growslice:cap Out of Range")}//If the current tile size is 0, the expansion method is also called        , then the new volume is returned as a slice. Return Slice{unsafe. Pointer (&zerobase), Old.len, Cap}}//This is the strategy of expansion Newcap: = Old.cap Doublecap: = Newcap + newcap if cap &G T            doublecap {newcap = cap} else {if Old.len < 1024x768 {Newcap = Doublecap} else {    For Newcap < cap {Newcap + = NEWCAP/4}}}//Calculate the capacity, length of the new slice. var lenmem, Newlenmem, capmem uintptr const PTRSIZE = unsafe. Sizeof (*byte) (nil)) switch Et.size {case 1:lenmem = uintptr (old.len) Newlenmem = uintptr (CAP) Capmem        = Roundupsize (UIntPtr (newcap)) Newcap = Int (capmem) Case ptrsize:lenmem = UIntPtr (old.len) * ptrsize  Newlenmem = uintptr (CAP) * Ptrsize Capmem = roundupsize (UIntPtr (newcap) * ptrsize) newcap = Int (Capmem/  ptrsize) Default:lenmem = UIntPtr (old.len) * Et.size newlenmem = uintptr (CAP) * Et.size Capmem =  Roundupsize (UIntPtr (newcap) * et.size) newcap = Int (capmem/et.size)}//Determine the illegal value, ensure that the capacity is increasing, and that the capacity does not exceed the maximum capacity if Caps < Old.cap | | UIntPtr (Newcap) > Maxslicecap (et.size) {Panic (errorstring ("Growslice:cap Out of Range")} var p unsafe. Pointer if Et.kind&kindnopointers! = 0 {//continue to expand capacity after old slices P = MALLOCGC (Capmem, nil, false)// Copy the Lenmem multiple bytes from the Old.array address to the address at P Memmove (p, Old.array, LENMEM)//Add the address of the new tile capacity to the P address first, and then the new slice Capacity toThis memory is initialized by the Capmem-newlenmem bytes behind the address.        Make room for subsequent append () operations. Memclrnoheappointers (Add (P, Newlenmem), Capmem-newlenmem)} else {//re-request new array to new slice//re-apply Capmen this large memory place and initialized to a value of 0 p = MALLOCGC (Capmem, ET, true) if!writebarrier.enabled {//If the write lock cannot be opened, then the Lenmem size            The bytes byte is copied from Old.array to p at address memmove (p, Old.array, Lenmem)} else {//Cycle copy old slice value For I: = uintptr (0); i < Lenmem; i + = et.size {Typedmemmove (ET, add (P, i), add (Old.array, I)}}}//Return final new slice, capacity update Capacity after the latest expansion return slice{p, Old.len, Newcap}}

The above is the implementation of the expansion. There are two main concerns, one is the strategy of the time of expansion, and the other is the expansion is to generate a new memory address or after the original address appended.

1. Expansion strategy

First look at the scaling strategy.

func main() {      slice := []int{10, 20, 30, 40}    newSlice := append(slice, 50)    fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))    fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))    newSlice[1] += 10    fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))    fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))}

Output Result:

Before slice = [10 20 30 40], Pointer = 0xc4200b0140, len = 4, cap = 4  Before newSlice = [10 20 30 40 50], Pointer = 0xc4200b0180, len = 5, cap = 8  After slice = [10 20 30 40], Pointer = 0xc4200b0140, len = 4, cap = 4  After newSlice = [10 30 30 40 50], Pointer = 0xc4200b0180, len = 5, cap = 8

The above process is represented by a graph.

From the diagram we can easily see that the new slices and the previous slices are different, because the new slice changes a value, does not affect the original array, the new slice point to the array is a completely new array. And the CAP capacity has also changed. What the hell is going on between this?

The strategy for slicing expansion in Go is this:

If the slice has a capacity of less than 1024 elements, it doubles the capacity when it expands. The above example also verifies that the total capacity has doubled from the original 4 to the present 8.

Once the number of elements exceeds 1024 elements, the growth factor becomes 1.25, which increases the original capacity by One-fourth each time.

Note: Capacity expansion is for the original capacity, not for the length of the original array.

2. New array or old array?

What about the array after the expansion must be new? This is not necessarily, in two cases.

Situation One:

func main() {      array := [4]int{10, 20, 30, 40}    slice := array[0:2]    newSlice := append(slice, 50)    fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))    fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))    newSlice[1] += 10    fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))    fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))    fmt.Printf("After array = %v\n", array)}

Print output:

Before slice = [10 20], Pointer = 0xc4200c0040, len = 2, cap = 4  Before newSlice = [10 20 50], Pointer = 0xc4200c0060, len = 3, cap = 4  After slice = [10 30], Pointer = 0xc4200c0040, len = 2, cap = 4  After newSlice = [10 30 50], Pointer = 0xc4200c0060, len = 3, cap = 4  After array = [10 30 50 40]

The above process is represented by a diagram, as in.

By printing the result, we can see, in this case, after the expansion of the new array is not new, before and after the expansion of the array is the same, which led to the new slice modified a value, also affect the old slice. and the Append () operation also changes the value inside the original array. A append () operation affects so many places that if there are multiple slices on the original array, then these slices will be affected! Inadvertently produced a nameless bug!

In this case, because the original array can also be expanded capacity, so after the append () operation, the original array will be directly manipulated, so in this case, after the expansion of the array is still pointing to the original array.

This situation is also very easy to appear when the literal creation of the slice, the third parameter when the CAP value, if the literal creation of slices, the cap is not equal to the total capacity of the array, then this happens.

slice := array[1:2:3]

The above situation is very dangerous and extremely prone to bugs.

It is recommended that the value of the CAP must be kept awake when creating slices with literal values, avoiding the bug caused by sharing the original array.

Situation Two:

Situation two is actually in the expansion strategy inside the example, in that example, the reason for generating a new slice, is because the original array of capacity has reached the maximum, and then want to expand, go by default will open a memory area, the original value copied over, and then perform append () operation. This situation does not affect the original array in the slightest.

Therefore, it is recommended to avoid situation one as far as possible, use case two to avoid bugs.

Five. Slice copy

There are 2 copy methods in Slice.

Func slicecopy (to, FM slice, width uintptr) int {//If the source slice or target slice has a length of 0, then no copy is required, direct return if Fm.len = = 0 | | to.l    En = = 0 {return 0}//n record the length of the source slice or the shorter of the target slice n: = Fm.len if To.len < n {n = to.len}        If the entry width = 0 does not need to be copied, the length of the shorter slice is returned if width = = 0 {return n}//If competition detection is turned on raceenabled { CALLERPC: = Getcallerpc (unsafe.        Pointer (&to)) PC: = FUNCPC (slicecopy) racewriterangepc (To.array, UIntPtr (n*int (width)), CALLERPC, PC) Racereadrangepc (Fm.array, UIntPtr (n*int (width)), CALLERPC, PC)}//If the memory sanitizer (Msan) If MSA is turned on  nenabled {msanwrite (To.array, UIntPtr (n*int (width))) Msanread (Fm.array, UIntPtr (n*int (width)))} size        : = UIntPtr (n) * width if size = = 1 {//Todo:is this still worth it with new Memmove Impl? If there is only one element, then the pointer can be converted directly * (*byte) (To.array) = * (*byte) (Fm.array)//known to be a byte pointer} else {// If more thanAn element, then the size of the bytes from the Fm.array address, copied to the To.array address Memmove (To.array, Fm.array, size)} return n} 

In this method, the Slicecopy method copies the elements in the source slice value (that is, the FM Slice) to the target slice (that is, to Slice) and returns the number of elements copied, and the two types of copy must be identical. The final copy of the Slicecopy method depends on the shorter slice, and when the shorter slices are copied, the entire replication process is complete.

For example, for example:

func main() {      array := []int{10, 20, 30, 40}    slice := make([]int, 6)    n := copy(slice, array)    fmt.Println(n,slice)}

There is also a copy of the method, this method is similar to the Slicecopy method, not to repeat, the comments are written in the code.

func slicestringcopy(to []byte, fm string) int {      // 如果源切片或者目标切片有一个长度为0,那么就不需要拷贝,直接 return     if len(fm) == 0 || len(to) == 0 {        return 0    }    // n 记录下源切片或者目标切片较短的那一个的长度    n := len(fm)    if len(to) < n {        n = len(to)    }    // 如果开启了竞争检测    if raceenabled {        callerpc := getcallerpc(unsafe.Pointer(&to))        pc := funcPC(slicestringcopy)        racewriterangepc(unsafe.Pointer(&to[0]), uintptr(n), callerpc, pc)    }    // 如果开启了 The memory sanitizer (msan)    if msanenabled {        msanwrite(unsafe.Pointer(&to[0]), uintptr(n))    }    // 拷贝字符串至字节数组    memmove(unsafe.Pointer(&to[0]), stringStructOf(&fm).str, uintptr(n))    return n}

Let's take another example, for example:

func main() {      slice := make([]byte, 3)    n := copy(slice, "abcdef")    fmt.Println(n,slice)}

Output:

3 [97,98,99]

When it comes to copying, there is a problem to be aware of in slicing.

func main() {      slice := []int{10, 20, 30, 40}    for index, value := range slice {        fmt.Printf("value = %d , value-addr = %x , slice-addr = %x\n", value, &value, &slice[index])    }}

Output:

value = 10 , value-addr = c4200aedf8 , slice-addr = c4200b0320  value = 20 , value-addr = c4200aedf8 , slice-addr = c4200b0328  value = 30 , value-addr = c4200aedf8 , slice-addr = c4200b0330  value = 40 , value-addr = c4200aedf8 , slice-addr = c4200b0338

From the above results we can see that if you use range to traverse a slice, the value obtained is actually a copy of the values in the slice. So every time you print the Value, the address will be the same.

Because value is a copy of values and is not a reference pass, changing the value directly is not the purpose of altering the original slice values, it is necessary to &slice[index] obtain a real address.

Reference:
"Go in Action"
"Go Language Learning Notes"

GitHub Repo:halfrost-field

Follow:halfrost GitHub

source:https://halfrost.com/go_slice/

Scan QR code to share this article

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.