This is a creation in Article, where the information may have evolved or changed.
Original: How to avoid Go gotchas
Tl;dr
Wikipedia's definition of "pit" (called in the original Gotcha
):
A gotcha is a valid construct in a system, program or programming language the works as documented but is Counter-intuiti ve and almost invites mistakes because it's both easy-to-invoke and unexpected or unreasonable in its outcome
(Source:wikipedia)
The Go language has some of the "pits" we often say, and there are many excellent articles that discuss these "pits". What these articles discuss is very important, especially for Go beginners, accidentally fell into these "pits".
But there was a question that puzzled me for a long time, so why did I hardly ever come across most of the "pits" discussed in these articles? Really, most of the more well-known such as "Nil Interface" or "slice append" I have never felt confused. I've been avoiding these kinds of problems in some way since I started using Go.
Later, I was fortunate enough to have read many articles explaining the internal implementation of the go data structure and learned the basics of how to run the inside of go. This knowledge is enough for me to have a deep understanding of Go, but also to avoid falling into all kinds of pits.
Remember the definition of Wikipedia,"The Pit is ... Effective construction ... But at the same time, it's counter intuitive.
So, you only have two options:
- The "fix" language
- Fix your Intuition
The second is obviously a better choice, and once you have a clear image in your head that depicts how slices or interfaces work at the bottom, it's impossible to fall into those traps.
This way of learning is useful to me, and I want to apply it to other people as well. That's why I decided to put some basic knowledge of Go internals in this article, hoping to help others build a clear intuition about the representations of various data structures in memory.
Let's start with some basics:
- Pointer (pointers)
- Arrays and slices (Arrays and slices)
- Append
- Interface (Interfaces)
- Null interface (Empty interface)
Pointer (pointers)
Go is actually a language that is very close to the hardware at the level. When you create a 64-bit shaping variable ( int64
) you know exactly how much memory it occupies and can use unsafe. Sizeof () method to calculate the amount of memory used for each type.
I often "see" the size of these variables, arrays, and data structures with a visual memory block. Visual presentations can give people a more intuitive understanding of these types, as well as explain some behavioral and performance problems.
As a warm-up, we first visualize the most basic types of Go:
Image.png
Suppose you're on a 32-bit machine (I know you probably only have 64-bit machines now ...), and you can clearly see int64
the memory footprint of int32
twice times.
The internal representation of the pointer is slightly more complex, occupying a piece of memory that contains a memory address that points to other memory blocks that store the actual data. There is a term called " reference pointer" , which in effect refers to "a block of memory that is actually pointed to by the address stored in the pointer variable." You can imagine the pointer in memory representation:
Image.png
The address in memory is usually expressed in hexadecimal, as shown in the figure "0x ..." . Remember, the value of the pointer is stored in one place, and the data that the pointer points to is stored in another place, which can help us understand later.
For Go novices without pointers, it is very confusing to value the function parameter "value passing". As you may already know, all the arguments in Go are passed by value, that is, by copying to implement the arguments.
This is illustrated below:
Image.png
In the first example, all the memory blocks are copied-but the actual variables used will be more than 2 or even 2 million blocks of memory, which would be a very expensive operation if all were copied. In the second example, only the memory that contains the actual data memory address needs to be copied, which is very efficient and inexpensive.
Obviously, in the first example, changing the Foo()
variables in the method p
does not modify the contents of the original data, but in the second example, the p
memory block of the original data pointed to is definitely modified.
Understanding the key internal implementations will help you to avoid most of the pits, and let's go a little deeper.
Arrays and slices (Arrays and Slices)
Beginners tend to be confused and confused about slices and arrays. So let's take a look at the array first.
Array (Arrays)
var arr [5]intvar arr [5]int{1,2,3,4,5}var arr [...]int{1,2,3,4,5}
Arrays are just contiguous chunks of memory, and if you read the Go Runtime source code (SRC/RUNTIME/MALLOC.GO), you will find that creating an array is essentially allocating a specified size of memory. Did you think of the classic malloc
? It's just smarter in Go.
Old good malloc, just smarter:)
// newarray allocates an array of n elements of type typ.func newarray(typ *_type, n int) unsafe.Pointer { if n < 0 || uintptr(n) > maxSliceCap(typ.size) { panic(plainError("runtime: allocation size out of range")) } return mallocgc(typ.size*uintptr(n), typ, true)}
This means that we can simply use a set of memory blocks that are connected by pointers to represent an array:
Image.png
The array element is always initialized to the 0 value of the specified type, in our example, [5]int
the initialization value is 0
. We can index each element in the array by indexing, or we can get the array length through the built-in function len()
.
When you index an element in an array by subscript and do the following:
var arr [5]intarr[4] = 42
You will take the fifth (4+1) element and change its value:
Image.png
Now we are ready to explore the slices.
Slices
A slice looks like an array at a glance, and even the syntax of the statement is similar:
var foo []int
But if we read the source of Go, we'll see that (SRC/RUNTIME/SLICE.GO) the data structure of the slice actually consists of 3 parts-pointers to arrays, length of slices, and tile capacity:
type slice struct { array unsafe.Pointer len int cap int}
When a new slice is created, the Go runtime creates an object in memory that contains 3 blocks, and initializes the array pointer nil
, len
and cap
initializes it 0
. Let's take a look at its visual representation:
Image.png
You can use make
to initialize a slice of a specified size:
foo = make([]int, 5)
This code creates a slice that contains an array of 5 elements, with an initial value of 0 for each element, len
and cap
an initial value of 5.
The cap is the upper limit on the size of the slice to make room for possible future growth. You can use make([]int, len, cap)
syntax to specify capacity. In practice you will probably never have to pay special attention to it, but it is important for us to understand the concept of capacity.
foo = make([]int, 3, 5)
Here are two examples:
Image.png
If you want to change the value of some elements in a slice, you are actually changing the value of the array element that the slice points to.
foo = make([]int, 5)foo[3] = 42foo[4] = 100
Image.png
That's good to understand. Let's make things a little more complicated, create a sub-slice based on the original slice, and then change the value of the element in the sub-slice?
foo = make([]int, 5)foo[3] = 42foo[4] = 100bar := foo[1:4]bar[1] = 99
Image.png
As you can see from the diagram, the value of the element that we have changed actually changes the value of the bar
element in the array it points to, that is, the foo
same array that is pointing at the same time. The real situation is true, and you might write code like this:
var digitRegexp = regexp.MustCompile("[0-9]+")func FindDigits(filename string) []byte { b, _ := ioutil.ReadFile(filename) return digitRegexp.Find(b)}
Assuming that the data we have read is placed in a 10MB
slice, but only in it 3 个数字
, intuitively we may feel that the operation will only return 3 个字节
data, but on the contrary, the array pointed to by the slice will be stored in memory regardless of its size.
Image.png
This is a very common pit for Go slices, and you may not be able to predict exactly how much memory is consumed in order to use this slice. But once you have a visual representation of the inner implementation of the slice in your mind, I'll bet you'll be confident that you'll find it almost the next time.
Append
After talking about the slices themselves, let's look at the built-in functions of the slices append()
. Its essentially function is to add an element value to the slice, but in the internal implementation, in order to do when necessary to achieve the intelligent and efficient allocation of memory, it has done a lot of complex operations.
Take a look at the following code:
a := make([]int, 32)a = append(a, 1)
Remember cap
--the size of the slices? Capacity represents the maximum capacity a slice can achieve . append
checks whether the tile's capacity also allows for expansion and allocates more memory for the slice if it can. Allocating memory is a very expensive operation, so when you use append
an element that adds a 1-byte size to a slice, you actually append
try to allocate 32 bytes at a time, and each expansion will be twice times the original capacity. This is because allocating more memory at once is typically less expensive and faster than allocating a small amount of memory multiple times.
What is confusing here is that, for various reasons, allocating more memory typically means first requesting a new large enough memory space at a different memory address and then copying the data from the current memory address into the new memory block. That is, the address of the array pointed to by the slice will also be changed. The visual representation is as follows:
Image.png
It is clear that there will be two arrays that are pointed to, original and newly allocated. Did you smell a bit of "pit"? If the original array is not pointed to by other slices, it will be released later by the garbage collection mechanism. In this case, there is actually a pit that is caused by a append operation. If we create a sub-slice and b
then a
append a value on the slice, will the two slices also work together to point to the same array?
a := make([]int, 32)b := a[1:16]a = append(a, 1)a[2] = 42
Image.png
With the illustration above, the slices a
b
are pointing to two different arrays, which may be a bit counterintuitive for beginners. Therefore, it is a rule of thumb to use extra care when using sub-slices, especially when accompanied by append operations.
append
When scaling the tile, if the size is within 1024 bytes, each time it will double the size to request memory, but if more than 1024 bytes will use the so-called memory size classes to ensure that the growth capacity will not be greater than the current capacity of 12.5%. Because it is not a problem to request 64 bytes of memory for an array of size 32 bytes at a time, but if the slice has a capacity of 4GB or more, adding a new element at this point would be too costly if you were to allocate more than 4GB of memory directly, and the rule above was to consider the situation.
Interface (Interfaces)
This is the most confusing part for many people. It takes time to master and understand how to use interfaces correctly in Go, especially for programmers who have a lot of experience in other object-oriented languages. One of the root causes of this confusion is that nil
the keyword always has a different meaning in the context of the interface.
To understand this part, let's look at the source code again.
What exactly is in the underlying implementation of the interface? Here is a section of source Src/runtime/runtime2.go excerpt:
type iface struct { tab *itab data unsafe.Pointer}
itab
Represents the interface table, which is a data structure that holds the metadata required for the interface and the underlying type:
type itab struct { inter *interfacetype _type *_type link *itab bad int32 unused int32 fun [1]uintptr // 变量大小}
We do not intend to delve into the implementation logic of the interface type assertion, but it is important to understand that interface is the interface and static type information plus a composite of pointers to the actual variables ( iface
data
fields in). Let's create a variable of an interface type error
err
and visualize its structure:
var err error
Image.png
What is shown in this picture is actually the legendary nil interface. When you return a type in a method error
, you return the object. It contains the information for the interface ( itab.inter
), but data
the field and itab.type
the value are nil
. When you use if err == nil {}
judgment, the object will be judged as true
.
func foo() error { var err error // nil return err}err := foo()if err == nil {...} // true
A well-known "pit" is when you return a value nil
of *os.PathError
type variable.
func foo() error { var err *os.PathError // nil return err}err := foo()if err == nil {...} // false
Unless you know exactly what the internal structure of the interface is in memory, the above two pieces of code look almost indistinguishable. Now let's see nil
how the value *os.PathError
type variable is wrapped in error
the interface.
Image.png
You can see clearly *os.PathError
-just a block of nil
memory that holds the value, because the 0 value of the pointer is nil
. But foo()
error
What is actually returned is a more complex structure that contains information about the interface, the type of the interface, and the nil
memory address where the value is stored. Did you find a different place?
In the above two examples, we have created nil
, but there is a huge difference between an interface that contains a nil
variable with a value and an interface that does not contain a variable . With this understanding of the internal structure of the interface, let's take a look at these two examples that are easy to confuse:
Image.png
There should be no more confusion about similar problems now.
Null interface (Empty interface)
Next, let's talk about the null interface (empty interface) - interface{}
. In the Go source code (SRC/RUNTIME/MALLOC.GO uses an own structure eface
to achieve:
type eface struct { _type *_type data unsafe.Pointer}
It iface
's much like it, but it lacks an interface table interface table
. Because the null interface is implemented by any static type from the definition, the eface
interface table is not required. When you try to show or implicitly (for example, pass as a method parameter) encapsulate something into the interface{}
memory, it is actually the structure that is stored:
func foo() interface{} { foo := int64(42) return foo}
Image.png
interface{}
the problem with an empty interface is that it is not convenient to assign an interface slice to a mixed-type slice, or vice versa. Like what:
func foo() []interface{} { return []int{1,2,3}}
This code will cause a compilation error:
$ go buildcannot use []int literal (type []int) as type []interface {} in return argument
It would be confusing at first. Why can we do the conversion directly in a single variable, but not in the slice type? Once we know what the empty interface is essentially (and then look at the diagram above), it is clear why this "conversion" is a very expensive operation involving allocating large amounts of memory and O (n) About time and space complexity. And one of the design principles of Go is that if you need to do something expensive-do it openly (explicitly rather than implicitly).
Image.png
Conclusion
Not every pit needs to be understood through learning the internal implementation of Go. There are some things just because the past experience and go play some different, after all, we each have a variety of background and experience. However, with a little in-depth understanding of the internal workings of Go, you can avoid falling into most traps.
I hope the explanations in this article will help you build an intuitive impression of the inside of the Go program, and believe that this knowledge can help us become a better developer. Go is a man's best friend, and knowing a little bit deeper is not going to hurt:)
If you want to learn more about the internal implementation of Go, I've selected a few articles listed here:
- GO Data Structures
- Go Data structures:interfaces
- Go Slices:usage and Internals
- Gopher puzzlers
Of course, how can not forget these are only the book of Destiny:)
- Go Source Code
- Effective Go
- Go Spec
Happy hacking!
In addition, the author November 16 made a related share of Golang BCN.
Interested to see the slideshow of this share: How to Avoid Go gotchas.pdf