This is a creation in Article, where the information may have evolved or changed.
All Parts:part 1 | Part 2 | Part 3 | Part 4 | Part 5
The bootstrapping process is the key to understanding how the Go runtime works. Learning it is essential, if you want to move forward with Go. So the fifth installment in our Golang Internals series are dedicated to the Go runtime and, specifically, the Go bootstrap Process. This time you'll learn about:
- Go bootstrapping
- Resizable Stacks implementation
- Internal TLS Implementation
Note that this post contains a lot of assembler code and you'll need at least some basic knowledge of it to proceed E is a quick guide to Go ' s assembler). So let ' s get going!
Finding an entry point
First, we need to the Find what function was executed immediately after we start a Go program. To does this, we'll write a simple Go app:
Package MainFunc Main () {print (123)}
Then we need to compile and link it:
Go tool 6g Test.gogo tool 6l test.6
This would create a executable file called 6.out in your current directory. The next step involves the Objdump tool, which is specific to Linux. Windows and MAC users can find analogs or skip this step altogether. Now run the following command:
Objdump-f 6.out
You should get output that would contain the start address:
6.out: file format elf64-x86-64architecture:i386:x86-64, Flags 0x00000112:exec_p, has_syms, D_pagedstart address 0x000000000042f160
Next, we need to disassemble our executable and the Find what function was located at this address:
objdump-d 6.out > Disassemble.txt
Then we need to open the disassemble.txt file and search for "42f160." Here's what I got:
000000000042f160 <_rt0_amd64_linux>: 42f160:48 8d 0x8 (%RSP),%rsi 42f165:48 8b 3c 24 mov (%RSP),%rdi 42f169:48 8d, 0x10 (%rip),%rax # 42f180
42f170:ff E0 JMPQ *%rax
Nice, we have found it! The entry point to my OS and architecture is a function called _rt0_amd64_linux.
The starting sequence
Now we need to find this function in Go runtime sources. It is located in the rt0_linux_arm64.s file. If you look inside the Go runtime package, you can find many filenames with postfixes related to OS and architecture names . When a runtime was built, only the files that correspond to the current OS and architecture are selected. The rest is skipped. Let's take a closer look at rt0_linux_arm64.s:
TEXT _rt0_amd64_linux (SB), Nosplit,$-8leaq8 (SP), SI//argvMOVQ0 (SP), DI//Argcmovq$main (SB), Axjmpaxtext Main (SB), Nosplit,$-8movq$runtime Rt0_go (SB), Axjmpax
The _rt0_amd64_linux function is a very simple. It calls the main function and saves arguments (argc and argv) in registers (DI and SI ). The arguments is located in the stack and can is accessed via the SP (stack pointer) register. The main function is also very simple. It calls Runtime.rt0_go. The runtime.rt0_go function is longer and more complicated, so I'll break it to small parts and describe each One separately.
The first section goes like this:
Movqdi, ax//argcmovqsi, bx//argvsubq$ (4*8+7), sp//2args 2autoandq$~15, Spmovqax, (SP) MOVQBX, (SP)
Here, we put some previously saved command line argument values inside the AX and BX decrease Stack Poin Ters. We also add space for the Four-byte variables and adjust it to be 16-bit aligned. Finally, we move the arguments back to the stack.
Create istack out of the given (operating system) stack.//_cgo_init may update stackguard. Movq$runtime G0 (SB), Dileaq ( -64*1024+104) (SP), BXMOVQBX, g_stackguard0 (di) movqbx, G_stackguard1 (DI) movqbx, (g_stack +stack_lo) (DI) movqsp, (G_stack+stack_hi) (DI)
The second part was a bit more tricky. First, we load the address of the global runtime.g0 variable into the DI register. This variable are defined in the proc1.go file and belongs to the runtime,g type. Variables of this type is created for each goroutine in the system. As you can guess, runtime.g0 describes a root goroutine. Then we initialize the describe the stack of the root goroutine. The meaning of stack.lo and Stack.hi should be clear. These is pointers to the beginning and the end of the stack for the current goroutine, but what is the stackguard0 and stackguard1 fields? To understand this, we need to set aside the investigation of the Runtime.rt0_go function and take a closer look At the stack growth in Go.
resizable stack implementation in Go
The Go language uses resizable stacks. Each goroutine starts with a small stack and its size changes each time a certain threshold is reached. Obviously, there is a by-check whether we have the reached this threshold or not. In fact, the check was performed at the beginning for each function. To see how it works, let's compile our sample program one more time with the- s flag (this would show the generate d assembler code). The beginning of the main function looks like this:
"". Main t=1 size=48 value=0 args=0x0 locals=0x80x0000 00000 (test.go:3) TEXT "". Main+0 (SB), $8-00x0000 00000 (Test.go:3) MOVQ (TLS), cx0x0009 00009 (test.go:3) cmpqsp,16 (CX) 0x000d 00013 (test.go:3) jhi,220x000f 00015 (test.go:3) Call, Runtime.morestack_noctxt (SB) 0x0014 00020 (test.go:3) jmp,00x0016 00022 (test.go:3) subq$8,sp
First, we load a value from thread local storage (TLS) to the CX Register (I had already explained what TLS I s in one of my previous posts). This value is contains a pointer to the RUNTIME.G structure that corresponds to the current goroutine. Then we compare the stack pointer to the value located at a offset of bytes in the runtime.g structure. We can easily calculate that corresponds to the stackguard0 field.
So, this is how we check if we have reached the stack threshold. If we haven ' t reached it yet, the check fails. In this case, we call the runtime.morestack_noctxt function repeatedly until enough memory have been allocated for The stack. The Stackguard1 field works very similarly to stackguard0, but it's used inside the C stack growth Prologue Inst EAD of Go. The inner workings of Runtime.morestack_noctxt is also a very interesting topic, but we'll discuss it later. For now, let's return to the bootstrap process.
Continuing the investigation of Go bootstrapping
We'll proceed with the starting sequence by looking at the next portion of code inside the runtime.rt0_go funct Ion
//Find out information about the processor we ' re onmovq$0, Axcpuidcmpqax, $0jenocpuinfo//figure out how to serialize rdtsc.//on the Intel processors lfence is enough. AMD requires mfence.//Don ' t know about the rest and so let's do mfence. CMPLBX, $0x756e6547//"Genu" JNENOTINTELCMPLDX, $0x49656e69//"Inei" JNENOTINTELCMPLCX, $0x6c65746e//"Ntel" JNEnotinte Lmovb$1, Runtime LFENCEBEFORERDTSC (SB) notintel:movq$1, AXCPUIDMOVLCX, Runtime cpuid_ecx (SB) MOVLDX, runtime·cpuid_ EdX (SB) Nocpuinfo:
This was not a crucial for understanding major Go concepts, so we'll look through it briefly. Here, we is trying to the figure out what the processor we are using. If It is Intel, we set the runtime LFENCEBEFORERDTSC variable. The runtime Cputicks method is the only place where this variable is used. This method utilizes a different assembler instruction to get CPU ticks depending on the value of runtime LFE NCEBEFORERDTSC. Finally, we call the CPUID assembler instruction, execute it, and save the result in the runtime cpuid_ecx and runtime Cpuid_edx variables. These was used in the alg.go file to select a proper hashing algorithm that's natively supported by your compute R ' s architecture.
Ok, let's move on and examine another portion of code:
If there is a _cgo_init, call it. Movq_cgo_init (SB), Axtestqax, axjzneedtls//G0 already in Dimovqdi, cx//Win64 uses CX for first parametermovq$setg_gcc< ;> (SB), sicallax//update stackguard after _cgo_initmovq$runtime G0 (SB), CXMOVQ (G_stack+stack_lo) (CX), axaddq$ Const__stackguard, Axmovqax, G_stackguard0 (CX) Movqax, G_stackguard1 (CX) cmplruntime iswindows (SB), $0JEQ OK
This fragment was only executed if CGO is enabled. CGO is a topic for a separate discussion and we might talk about it in one of the upcoming posts. At this point, we have want to understand the basic bootstrap workflow and so we'll skip it.
The next code fragment is responsible for setting up TLS:
needtls://Skip TLS setup on Plan 9cmplruntime isplan9 (SB), $1jeq ok//Skip TLS setup on Solariscmplruntime issolaris (SB), $1jeq okleaqruntime Tls0 (SB), Dicallruntime SETTLS (SB)//store through it, to make sure it worksget_tls (BX) movq$0x123, G ( BX) Movqruntime Tls0 (SB), Axcmpqax, $0x123jeq 2 (PC) Movlax, 0//abort
I have already mentioned TLS before. Now it's time to understand what it is implemented.
Internal TLS Implementation
If you look at the previous code fragment carefully, you can easily understand that the only lines that does actual work is :
Leaqruntime Tls0 (SB), Dicallruntime Settls (SB)
All of the other stuff are used to skip TLS setup when it's not a supported on your OS and check that TLS works correctly. The lines above store the address of the runtime tls0 variable in the DI register and the runtime SE TTLs function. The code of this function is shown below:
Set TLS base to Ditext runtime Settls (SB), nosplit,$32addq$8, di//ELF wants to Use-8 (FS) Movqdi, simovq$0x1002, di//AR ch_set_fsmovq$158, ax//Arch_prctlsyscallcmpqax, $0XFFFFFFFFFFFFF001JLS2 (PC) Movl$0xf1, 0xf1 //CrashRET
From the comments, we can understand it this function makes a arch_prctl system call and passes Arch_set_f S as an argument. We can also see, this is the system call sets a base for the FS segment register. In our case, we set TLS to the runtime tls0 variable.
Do you remember the instruction so we saw at the beginning of the assembler code for the main function?
0x0000 00000 (Test.go:3) movq (TLS), CX
I have previously explained that it loads the address of the RUNTIME.G structure instance into the CX register. This structure describes, the current goroutine, and is stored in thread local storage. Now we can find out and understand how this instruction is translated to machine assembler. If you open the previously created disassembly.txt file and look for the main.main function, the first I Nstruction inside it should look like this:
400C00: 8b 0c f0 ff mov %FS:0XFFFFFFFFFFFFFFF0,%RCX
The colon instruction (%fs:0xfffffffffffffff0) stands for segmentation Addressing (you can read more on I T here).
Returning to the starting sequence
Finally, let's look at the last of the parts of the runtime.rt0_go function:
ok://set the Per-goroutine and Per-mach "registers" GET_TLS (BX) leaqruntime G0 (SB), CXMOVQCX, G (BX) Leaqruntime M0 (SB), ax//Save m->g0 = G0movqcx, m_g0 (AX)//Save M0 to G0->mmovqax, G_m (CX)
Here, we load the TLS address into the BX register and save the address of the runtime g0 variable in TLS. We also initialize the runtime.m0 variable. If runtime.g0 stands for root goroutine, then runtime.m0 corresponds to the root operating system thread Used to run this goroutine. We may take a closer look at runtime.g0 and RUNTIME.M0 structures in upcoming blog posts.
The final part of the starting sequence initializes arguments and calls different functions, it is a topic for a sep Arate discussion.
More on Golang
So, we had learned the inner mechanisms of the bootstrap process and found out how stacks is implemented. To move forward, we need to analyze the last part of the starting sequence. That'll be the subject of my next post. If you want to get notified as soon as it comes out, hit the subscribe button below or follow @altoros.
Read all parts of the Series:part 1 | Part 2 | Part 3 | Part 4 | Part 5
About the Author: Sergey Matyukevich is a Cloud Engineer and Go Developer at Altoros. With 6+ years in software engineering, he's an expert on cloud automation and designing architectures for complex cloud-b ased systems. An active member of the Go community, Sergey are a frequent contributor to Open-source projects, such as Ubuntu and Juju Ch Arms.