This is a creation in Article, where the information may have evolved or changed. This article by Bole Online-yhx translation, Huang Li-min school draft. without permission, no reprint!
English Source: Sergey Matyukevich. Welcome to join the translation team.
The startup process is the key to understanding how the Go language runtime works. If you want to go deeper into go, it's important to analyze the startup process. So the fifth part focuses on the go runtime, especially the Go program startup process. This time you will learn the following:
- Go language startup process
- How the variable-size stacks are implemented
- Implementation mechanism of TLS
Please note that there will be a lot of assembly code in this blog, you need to know the knowledge in advance (see here for the quick start of the Go assembler). Let's get started!
Find an entry point
First you need to find the first function to execute after starting the Go program. To find this function, we wrote an extremely simple Go application:
Package MainFunc Main () {print (123)}
Then, compile and link:
Go tool 6g Test.gogo tool 6l test.6
This will generate an executable file of 6.out in the current directory. The next step is to use the objdump, a tool on a Linux system. On Windows or MAC, you need to look for similar tools or skip this step directly. Run the following command:
Objdump-f 6.out
You can see the output information that contains the start address:
6.out: file format elf64-x86-64architecture:i386:x86-64, Flags 0x00000112:exec_p, has_syms, D_pagedstart address 0x000000000042f160
Next, we will disassemble the executable program and find out what function it was at the beginning:
objdump-d 6.out > Disassemble.txt
Now that we can open the disassemble.txt file and search for "42f160", we can get the following results:
000000000042f160 <_rt0_amd64_linux>: 42f160:48 8d 0x8 (%RSP),%rsi 42f165:48 8b 3c 24 mov (%RSP),%rdi 42f169:48 8d, 0x10 (%rip),%rax # 42f180
42f170:ff E0 JMPQ *%rax
Well, we've found it. The function of the entry point on my computer (which is related to the OS and the architecture of the machine) is _rt0_amd64_linux.
Boot order
Now we need to find the source code of the function in the Go runtime source. It is located in the Rto_linux_arm64.s file. If you look at the Go Language runtime package, you'll find that there are a lot of filename prefixes associated with the OS or machine architecture. When a run-time package is generated, only files associated with the current system and schema are selected. And the rest will be skipped. Let's take a look at rt0_linux_arm64.s:
TEXT _rt0_amd64_linux (SB), Nosplit,$-8leaq8 (SP), SI//argvMOVQ0 (SP), DI//Argcmovq$main (SB), Axjmpaxtext Main (SB), Nosplit,$-8movq$runtime Rt0_go (SB), Axjmpax
The _rt0_amd64_linux function is very simple. It simply saves the parameters (ARGC and argv) to the register (DI and SI) and then calls the main function. The parameters stored in the stack can be accessed through the SP (stack pointer). The main function is also very simple. It just calls the runtime.rt0_go. The runtime.rt0_go function is a bit more complicated, so I'll split it into several parts, and then I'll discuss each part in turn.
The first part is this:
Movqdi, ax//argcmovqsi, bx//argvsubq$ (4*8+7), sp//2args 2autoandq$~15, Spmovqax, (SP) MOVQBX, (SP)
Here, we place the previously stored command-line parameter values in AX and BX registers, respectively. Reduce the stack pointer at the same time to add two additional four-byte variables and adjust the stack pointer to 16-bit alignment. Finally, put the parameters back on the stack.
Create istack out of the given (operating system) stack.//_cgo_init may update stackguard. Movq$runtime G0 (SB), Dileaq ( -64*1024+104) (SP), BXMOVQBX, g_stackguard0 (di) movqbx, G_stackguard1 (DI) movqbx, (g_stack +stack_lo) (DI) movqsp, (G_stack+stack_hi) (DI)
The second part is more ingenious. First, we load the address of the global variable runtime.g0 into the DI register. This variable is defined in the Proc1.go file and belongs to the runtime.g type. Go creates a variable of this type for each goroutine in the system. As you suspect, runtime.g0 belongs to the root goroutine. We then initialize each domain that describes the root goroutine stack. The meaning of Stack.lo and Stack.hi should be clear. They are the start and end pointers to the current goroutine stack, but what are stackguard0 and stackguard1? To figure out the two variables, we'll first put the analysis of the Runtime.rto_go function aside to see how the stack grows in go.
Implementation of variable size stacks in Go
The Go language uses a variable-sized stack. Each goroutine starts with a smaller stack, but the stack size changes when the stack size is reached by a certain threshold. Obviously, there must be some mechanism here to check whether the stack size reaches the threshold. In fact, such a test is performed at the beginning of each function. To see how it works, let's compile our sample program again using the-s flag (this flag will show the generated assembly code). The beginning of the main function is this:
"". Main t=1 size=48 value=0 args=0x0 locals=0x80x0000 00000 (test.go:3) TEXT "". Main+0 (SB), $8-00x0000 00000 (Test.go:3) MOVQ (TLS), cx0x0009 00009 (test.go:3) cmpqsp,16 (CX) 0x000d 00013 (test.go:3) jhi,220x000f 00015 (test.go:3) Call, Runtime.morestack_noctxt (SB) 0x0014 00020 (test.go:3) jmp,00x0016 00022 (test.go:3) subq$8,sp
First, we load a value from the TLS (thread local storage) variable to the CX register (I already described TLS in the previous blog). This value is a pointer to the RUNTEIM.G structure that corresponds to the current goroutine. We then compare the stack pointer to the value at offset 16 bytes in the runtime.g struct. So we can know that the location is the stackguard0 domain.
So, this is how we detect if the stack threshold is reached. If the threshold has not yet been reached, we will always call the RUNTIME.MORESTACK_NOCTX function until enough space is allocated for the stack. Stackguard1 is very similar to stackguard0, but it is used in C's stack growth instead of Go. The mechanics of Runtime.morestack_noctx's internal work are also very interesting, and we'll discuss that part later. Now, let's go back to the startup process.
Continue to the Go boot process
Before starting the boot process, let's take a look at the following piece of code, which is the code in the Runtime.rt0_go function:
Find out information about the processor we ' re onmovq$0, Axcpuidcmpqax, $0jenocpuinfo//figure out what to serialize RDT sc.//on the Intel processors lfence is enough. AMD requires mfence.//Don ' t know about the rest and so let's do mfence. CMPLBX, $0x756e6547 //"Genu" JNENOTINTELCMPLDX, $0x49656e69 //"Inei" JNENOTINTELCMPLCX, $0x6c65746e //" Ntel "jnenotintelmovb$1, Runtime Lfencebeforerdtsc (SB) notintel:movq$1, AXCPUIDMOVLCX, Runtime cpuid_ecx (SB) MOVLDX, Runtime Cpuid_edx (SB) Nocpuinfo:
This part is not very important to understand the main Go language concept, so we just look at it briefly. This code is designed to discover the CPU type of the system. If it is an Intel type, set the runtime LFENCEBEFORERDTSC variable, which is only used in runtime.cputicks. This function obtains the CPU ticks value according to the runtime LFENCEBEFORERDTSC using different assembly instructions. Finally, we execute the CPUID assembly instructions and save the results to RUNTIME.CPUID_ECX and Runtime.cpuid_edx. These variables are used by Alg.go to select the appropriate hashing algorithm based on the architecture of the computer.
OK, let's continue to analyze another part of the code:
If there is a _cgo_init, call it. Movq_cgo_init (SB), Axtestqax, axjzneedtls//G0 already in Dimovqdi, cx//Win64 uses CX for first parametermovq$setg_gcc< ;> (SB), sicallax//update stackguard after _cgo_initmovq$runtime G0 (SB), CXMOVQ (G_stack+stack_lo) (CX), axaddq$ Const__stackguard, Axmovqax, G_stackguard0 (CX) Movqax, G_stackguard1 (CX) cmplruntime iswindows (SB), $0JEQ OK
This code executes only if the CGO is allowed. CGO related content I'll discuss this in additional, and I might discuss this topic in a later blog post. Here, we just want to understand the basic start-up workflow, so let's skip this section first.
The next section of code is responsible for setting up TLS:
needtls://Skip TLS setup on Plan 9cmplruntime isplan9 (SB), $1jeq ok//Skip TLS setup on Solariscmplruntime issolaris (SB), $1jeq okleaqruntime Tls0 (SB), Dicallruntime SETTLS (SB)//store through it, to make sure it worksget_tls (BX) movq$0x123, G ( BX) Movqruntime Tls0 (SB), Axcmpqax, $0x123jeq 2 (PC) Movlax, 0//abort
I've been talking about TLS before. Now it's time to figure out how it all came true.
TLS Internal implementation
If you read through the previous code, it's easy to see that only a few lines of code are really working:
Leaqruntime Tls0 (SB), Dicallruntime Settls (SB)
All other code is code that skips TLS settings or detects if TLS is working correctly when your system does not support TLS. These two lines of code store the address of the runtime.tlso variable in the DI register, and then call the RUNTIME.SETTLS function. The code for this function is as follows:
Set TLS base to Ditext runtime Settls (SB), nosplit,$32addq$8, di//ELF wants to Use-8 (FS) Movqdi, simovq$0x1002, di//AR ch_set_fsmovq$158, ax//Arch_prctlsyscallcmpqax, $0XFFFFFFFFFFFFF001JLS2 (PC) Movl$0xf1, 0xf1 //CrashRET
As you can see from the note, this function performs a arch_prctl system call and passes ARCH_SET_FS as a parameter. We can also see that the system call uses the FS register to store the base address. In this example, we point TLS to the RUNTIME.TLS0 variable.
Do you remember the assembly instructions at the beginning of main?
0x0000 00000 (Test.go:3) movq (TLS), CX
As I explained earlier, this instruction loads the address of the runtime.g struct instance into the CX register. This struct describes the current goroutine and stores it in TLS. Now we understand how this instruction is assembled into machine instructions. Open the Disasembly.txt file you created before, search for the Main.main function, and you will see that the first instruction is:
400C00: 8b 0c f0 ff mov %FS:0XFFFFFFFFFFFFFFF0,%RCX
The colon (%fs:0xfffffffffffffff0) In this instruction indicates segment addressing (see here for more information).
Back to the startup process
Finally, let's take a look at the last two parts of the RUNTIME.RTO_GO function:
ok://set the Per-goroutine and Per-mach "registers" GET_TLS (BX) leaqruntime G0 (SB), CXMOVQCX, G (BX) Leaqruntime M0 (SB), ax//Save m->g0 = G0movqcx, m_g0 (AX)//Save M0 to G0->mmovqax, G_m (CX)
Here, we load the TLS address into the BX register and save the address of the RUNTIME.G0 variable to TLS. Initialize the RUNTIME.M0 variable at the same time. If runtime.g0 represents the root goroutine, then runtime.m0 corresponds to the system-level thread that runs the goroutine. We may further introduce RUNTIME.G0 and RUNTIME.M0 in a later blog post.
The last part of the startup process is to initialize the parameters and invoke different functions, but this is another topic.
More about Golang
We have learned the boot process of Go and the internal mechanism of its stack implementation. Later, we need to analyze the last part of the startup process. This will be the subject of the next blog post. If you want to see a blog update in time, follow @altoros.