This is a creation in Article, where the information may have evolved or changed. This article by Bole Online-yhx translation, Huang Li-min school draft. without permission, no reprint!
English Source: Sergey Matyukevich. Welcome to join the translation team.
- Go Language Insider (1): Key concepts and project structure
- The Go Language Insider (2): Go deep down compiler
This article will discuss the content related to the GO linker, the object file, and the relocation (relocation).
Why pay attention to these things? If you want to learn the internal mechanics of any big project, the first thing you need to do is learn to split it into different parts or modules. Next, you need to understand the interfaces that these modules provide to the outside. In Go, the compiler, linker, and runtime are such high-level modules. The interface between the compiler and the linker is the target file, so we'll start with the target file today.
Generate Go Target file
Let's do an experiment, write a very simple program and compile it, and look at what the generated target file is. In this example, I wrote a program like this:
Package MainFunc Main () {print (1)}
It's very simple and straightforward, isn't it? Now let's compile:
Go Tool 6g Test.go
This command generates a target file named Test.6. In order to understand the internal structure of this file, we will use the Goobj library. This library is useful in the source code of Go, and it is primarily used to implement some unit tests to determine whether the target files generated are correct in a variety of situations. For this blog, I wrote a simple program to output the content generated by the Goobj library to the terminal interface. You can find the source code of the program here.
First, you need to download and install my program:
Go get github.com/s-matyukevich/goobj_explorer
Next, execute the following command:
Goobj_explorer-o test.6
Now you can see the output of the Goob in your terminal. The structure of the package.
Explore the destination file
The most interesting part of the target file is the syms array. In fact, this is a symbol table. Everything you define in the program, including functions, global variables, types, constants, and so on, is written in this table. Let's take a look at the item in this table that stores the main function. (Note: I have deleted the contents of Reloc and Func in the output.) We will discuss these two parts later. )
&goobj. sym{ symid:goobj. Symid{name: "Main.main", version:0}, Kind: 1, dupok:false, Size: $ , Type: goobj. symid{}, Data: goobj. data{offset:137, size:44}, Reloc: ..., Func: ...,}
Goobj. The Sym of each field in a struct has a good explanation of its own meaning:
Field |
description/Description |
Symid |
The unique symbol ID. This ID value contains the name and version number of the symbol. Version information can help distinguish between symbols of the same name. |
Kind |
Identifies the type that the symbol belongs to (more on this later) |
Dupok |
Identifies whether symbolic redundancy (symbol with the same name) is allowed. |
Size |
The size of the symbol data. |
Type |
Refers to another symbol that represents the symbol type, if one exists. |
Data |
Contains the binary data. Different types of symbols have different meanings for this field. For example, for a function This field represents the assembly code, for a string symbol the field represents the original string, and so on. |
Reloc |
Reposition the list (more on this later). |
Func |
Contains the metadata for the function symbol, which is described in more detail. |
Now, let's take a look at the various symbols. All symbol types are constants defined in the Goobj package (refer to here). Below, I list the constant values for some of these:
Const (_ Symkind = iota//ReadOnly, executablestextselfrxsect//ReadOnly, Non-executablestypesstringsgostringsgofuncsrodatasfunctabstypelinkssymtab//Todo:move to unmapped Sectionspclntabselfrosect ...
As we can see, the main.main symbol belongs to type 1, which corresponds to the Stext constant. Stext is a symbol that contains executable code. Next, let's take a look at the Reloc array. This array includes the following structure:
Type Reloc struct {Offset intSize intsym symidadd inttype int}
Each relocatable item means that the byte of the interval [offset, offset+size] needs to be replaced by an appropriate address. This appropriate address can be obtained by adding an add byte to the address of the Sym symbol.
Deep understanding of relocation
Now let's use an example to explain how relocation works. To demonstrate, we need to compile our program with the-s parameter so that the compiler outputs the generated assembly code:
Go Tool 6g-s Test.go
Let's take a look at the assembly code and find the main function.
"". Main t=1 size=48 value=0 args=0x0 locals=0x80x0000 00000 (test.go:3) TEXT "". Main+0 (SB), $8-00x0000 00000 (Test.go:3) MOVQ (TLS), cx0x0009 00009 (test.go:3) cmpqsp,16 (CX) 0x000d 00013 (test.go:3) jhi,220x000f 00015 (test.go:3) Call, Runtime.morestack_noctxt (SB) 0x0014 00020 (test.go:3) jmp,00x0016 00022 (test.go:3) subq$8,sp0x001a 00026 (Test.go:3) Funcdata$0,gclocals 3280bececceccd33cb74587feedb1f9f+0 (SB) 0x001a 00026 (test.go:3) funcdata$1,gclocals 3280bececceccd33cb74587feedb1f9f+0 (SB) 0x001a 00026 (Test.go:4) movq$1, (SP) 0x0022 00034 (test.go:4) pcdata$0,$00x0022 00034 (Test.go:4) call,runtime.printint (SB) 0x0027 00039 (test.go:5) addq$8,sp0x002b 00043 (test.go:5) RET,
In subsequent articles, we will analyze this code more carefully to figure out how the Go runtime works. Here, we are only interested in the following line:
0x0022 00034 (test.go:4) call,runtime.printint (SB)
This line of instruction is 0x0022 (hexadecimal) or 00034 (decimal) offset in the function data. This line actually represents the call to the Runtime.printint function. The problem, however, is that the compiler does not know where the Runtime.printint function is in the compile phase. This symbol is in a target file that is completely unknown to another compiler. As a result, the compiler uses relocation. The following is a relocation item corresponding to this function call (which I copied from the output of the Goobj_explorer tool):
{ offset:35, Size: 4, Sym: goobj. Symid{name: "Runtime.printint", version:0}, Add: 0, Type: 3,},
This relocation item tells the linker to replace the start address of the runtime.printint symbol with 4 bytes starting at offset 35 bytes. However, the position of the 35-byte offset starting with the main function is actually the parameter of the calling instruction we saw earlier (the specified order starts at an offset of 34 bytes, where the first byte corresponds to the call instruction, and then four bytes is the address required by the instruction).
How does the linker work?
After figuring out how relocation works, we can understand how the linker works. The following summary is very simple, but it shows how the linker works:
- The linker collects symbolic information from all other packages referenced by the main package and mounts them into a large byte array (or binary image).
- For each symbol, the linker calculates its address in the mirror.
- It then applies a relocation for each symbol. This is very simple, because the linker already knows the exact address of all the symbols referenced by the relocation item.
- The linker prepares file headers for all ELF formats (in Linux system) files or PE format files (in Windows systems). It then regenerates into an executable file.
Deep understanding of TLS
Careful readers may notice from the output of the goobj_explorer that the compiler generated a strange relocation entry for the Main method. It cannot correspond to any method call or even point to an empty symbol:
{ offset:5, Size: 4, Sym: goobj. symid{}, Add: 0, Type: 9,},
So what does this relocation entry do? We can see that this entry has an offset of 5 bytes and its size is 4 bytes. At this offset, the corresponding assembly instruction is:
0x0000 00000 (Test.go:3) movq (TLS), CX
This instruction starts at 0 offsets and occupies 9 bytes of space (because the next command starts at a 9-byte offset). We guessed that the relocation entry would replace this strange TLS with an address, but what exactly is TLS? What is its address?
TLS is actually the abbreviation for thread local storage (thread locally Storage). This technique is useful in many programming languages (refer to here). Simply put, it provides one such variable for each thread, with different variables used to point to different areas of memory.
In the Go language, TLS stores a pointer to a G struct. The structure that the pointer points to includes the internal details of the Go routine (which is discussed in more detail later). Therefore, when you access the variable in a different routine, you actually access the struct to which the corresponding variable of the routine is pointing. The linker knows where the variable is located, and the previous instruction moves to the CX register, which is the variable. For AMD64,TLS is implemented with FS registers, where the commands we see earlier can actually be translated into Movq FS, CX.
At the end of the relocation, I listed the enumeration types that contain all the relocation types:
RELOC.TYPEENUM{R_ADDR = 1,r_size,r_call,//relocation for Direct pc-relative callr_callarm,//relocation for ARM dire CT callr_callind,//marker for indirect call (no actual relocating necessary) R_const,r_pcrel,r_tls,r_tls_le,//TLS Local exec offset from TLS segment Registerr_tls_ie,//TLS initial exec offset from TLS base pointerr_gotoff,r_plt0,r_plt1,r_p Lt2,r_usefield,};
As you can see from this enumeration type, relocation Type 3 is R_call, and relocation type 9 is R_TLS. The names of these enumeration types are a good explanation of the behavior that we discussed earlier.
More about the Go target file
In subsequent articles, we will continue the discussion of the target file. I will also provide you with more information to help you understand how the Go runtime works. If you have any questions, you are welcome to put them in the comments.