This is a creation in Article, where the information may have evolved or changed. This article by Bole Online-yhx translation, Huang Li-min school draft. without permission, no reprint!
English Source: Siarhei Matsiukevich. Welcome to join the translation team.
- Go Language Insider (1): Key concepts and project structure
- TheGo Language Insider (2): Go deep down compiler
- "Go Language Insider (3): linker, linker, relocation "
- Go Language Insider (4): Target file and function metadata
- Go Language Insider (5): Run-time startup process
This article is a follow-up to the Golang internal Mechanism Exploration series blog. The purpose of this series of blogs is to explore the go boot process, which is also the key to understanding the Go Runtime (runtime). In this article we will look at the second part of the startup process, analyze how the parameters are initialized and what function calls are in them, and so on.
Boot order
We continue from where we last ended. In the Runtime.r0_to function, we still have part of the analysis:
CLD //convention is D are always left cleared call runtime Check (SB) movl (SP), AX //Copy ARGC movl ax, 0 (sp) movq (sp), ax /copy argv movq ax, 8 (SP) Call Runtime args (SB) call runtime Osinit (SB) call runtime Schedinit (SB)
The first instruction (CLD) clears the flags Register direction flag. This flag affects the direction of the string when it is processed.
Next we call the Runtime.check function, which is not much of a benefit to our analysis run. In this function, the runtime creates instances of all built-in types, checks their size and other parameters, and so on. If anything is wrong, a panic error is generated. Readers are asked to read the code of the function themselves.
Parametric analysis
The runtime is called after the Runtime.check function. The Args function, this function is more interesting. In addition to storing parameters (ARGC and argv) in static variables, it also parses the ELF auxiliary vectors and initializes the address of system calls on the Linux system.
Here's what we need to explain. When the operating system loads a program into memory, it initializes the program's initial stack with data from some predefined formats. These parameters are stored at the top of the stack – pointers to environment variables. At the bottom of the stack, we can see the ELF helper vector. In fact, this auxiliary vector is an array of records that store other useful information, such as the number and size of the program headers. For more information on ELF auxiliary vectors, please refer to this article.
Runtime. The Args function is responsible for handling this vector. In all the information in the secondary vector store, the runtime only cares about Startuprandomdata, which is primarily used to initialize the hash function and pointers to the location of the system call. The following variables are initialized here:
__vdso_time_sym __vdso_gettimeofday_sym __vdso_clock_gettime_sym
They are used to get the current time in different functions. All of these variables have their default values. This allows Golang to invoke the corresponding function using the vsyscall mechanism.
Runtime.osinit function
The next call is the Runtime.osinit function during the startup process. On a Linux system, the only thing this function does is initialize the NCPU variable, which stores the number of CPUs in the current system. This is achieved through a system call.
Runtime.schedinit function
Then the Runtime.schedinit function is called, and this function is more interesting. First, it obtains a pointer to the current goroutine, which points to a G struct. When we discuss the implementation of TLS, we have already discussed how this pointer is stored. Next, it will call Runtime.raceinit. We are not going to discuss the Runtime.raceinit function here, because this function is not called when the race condition (race condition) is normally forbidden. Other initialization functions are then called in the Runtime.schedinit function.
Let's look at it in turn.
Initialize Traceback
Runtime.tracebackinit is responsible for initializing the traceback. Traceback is a function stack. These functions are called before we reach the current execution point. For example, we can see them every time a panic is generated. Traceback is generated by calling the Runtime.gentraceback function. To make this function work, we need to know the addresses of some built-in functions (for example, because we don't want them to be included in Traceback). Runtime.traceback is responsible for initializing these addresses.
Verifying linker symbols
The linker symbol is the data that the linker produces to output to the executable destination file. Most of the data has been discussed in theGo Language Insider (3): linker, linker, relocation . In a run-time package, the linker symbol is mapped to the moduledata struct. The Runtime.moduledataverify function checks the data to ensure that all structures are correct.
Initializing the Stack pool
To figure out the next step, you need to understand how the stack grows in Go. When a new goroutine is generated, the system assigns it a smaller, fixed-size stack. When the stack reaches a certain threshold, the stack size increases by one and copies all the data from the original stack to the new stack.
There are many details, such as how to determine if the threshold is reached, how Go adjusts the pointers in the stack, and so on. I have covered some of the relevant content when I introduced stackguard0 and function metadata in the previous blog. For more information, you can refer to this document.
Go uses a stack pool to cache the stack that is temporarily unused. This pool is actually an array initialized by the Runtime.stackinit function. Each item in this array is a linked list that contains the same size stack.
This step also initializes another variable, runtime.stackfreequeue. This variable also stores a list of stacks, but these stacks are added during garbage collection and are emptied at the end of the collection. Note that only a size of 2 kb,4 kb,8 KB and a stack of KB will be cached. Larger stacks are allocated directly.
Initializing the memory allocator
The memory allocation process is described in detail in this source code note. If you want to figure out how go memory allocation works, I strongly recommend that you read the documentation. I'll analyze the contents of memory allocations in detail in a later blog post. The initialization of the memory allocator is done in the Runtime.mallocinit function, so let's take a closer look at this function.
Initialize Size class
We can see that the first thing the Runtime.mallocinit function does is call another function –initsizes. This function is used to calculate the size class. But how big should each of these classes be? When assigning small objects (less than a. KB), the Go runtime first adjusts the size to the size of the classes defined at run time. Therefore, the size of the allocated memory block can only be one of several sizes defined. Typically, the allocated memory is larger than the requested memory size. This can lead to a small amount of memory wasted, but this allows us to better reuse these memory blocks.
The initsizes function is responsible for calculating the size of these classes. At the beginning of this function, we can see the following code:
Align: = 8for size: = align; Size <= _maxsmallsize; Size + = Align {if size& (size-1) = = 0 {if size >= 2048 {align =} else if size >= {align = size/8} else If size >= {align = 16 ...}}
We can see that the size of the smallest two classes is 8 bytes and 16 bytes, respectively. Subsequently, each increment of 16 bytes is a new class up to 128 bytes. From 128 bytes to 2048 bytes, the size of the class is incremented by Size/8 bytes at a time. After 2048 bytes, each increment of 256 bytes is a new class.
The Initsize method initializes the Class_to_size array, which is used to map the class (in this case, its index value in the Global Class list) to the size of its occupied memory space. The Initsize method also initializes the class_to_allocnpages. This array stores how much storage space is required for objects of the specified class. In addition, the SIZE_TO_CLASS8 and size_to_class128 two arrays are also initialized in this method. These two arrays are used to derive the corresponding class index based on the size of the object. The former is used for objects smaller than 1 KB in size, and the latter for objects 1–32 KB in size.
Virtual Memory Reservation
Below, we'll look at the virtual memory appointment function Mallocinit, which allocates a portion of memory from the operating system in advance for future memory allocations. Let's take a look at how it works under the x64 architecture. First, we need to initialize the following variables:
PSize = bitmapsize + spanssize + arenasize + _pagesize p = uintptr (Sysreserve (unsafe. Pointer (P), pSize, &reserved))
- The bitmapsize corresponds to the size of the memory required for the garbage collector bitmap. The garbage collector's bitmap is a special piece of memory that identifies where in memory the pointer is and where the object is, to facilitate the garbage collector to release. This space is managed by the garbage collector. For each allocated byte, we need two bits to store the information, which is why the size of the memory required for the bitmap is calculated as:arenasize/(ptrsize * 8/4)
- Spansize represents the amount of memory space required to store pointer arrays that point to memory span. Memory span refers to an array structure that encapsulates a block of RAM for allocation to an object.
Once all of the above variables have been calculated, you can complete the work of real resource reservations:
PSize = bitmapsize + spanssize + arenasize + _pagesize p = uintptr (Sysreserve (unsafe. Pointer (P), pSize, &reserved))
Finally, we initialize the global variable mheap. This variable is used to store memory-related objects centrally.
P1: = Round (P, _pagesize) Mheap_.spans = (**mspan) (unsafe. Pointer (p1)) Mheap_.bitmap = p1 + Spanssizemheap_.arena_start = p1 + (spanssize + bitmapsize) mheap_.arena_used = Mheap_.are Na_startmheap_.arena_end = p + psizemheap_.arena_reserved = Reserved
Note that the initial mheap_.arena_used value is equal to Mheap_.arena_start because no space has been allocated for any objects.
Initialize Heap
Next, call the Mheap_init function to initialize the heap. The first thing the function does is initialize the allocator.
Fixalloc_init (&h.spanalloc, unsafe. Sizeof (mspan{}), Recordspan, unsafe. Pointer (h), &memstats.mspan_sys) fixalloc_init (&h.cachealloc, unsafe. Sizeof (mcache{}), nil, nil, &memstats.mcache_sys) fixalloc_init (&h.specialfinalizeralloc, unsafe. Sizeof (specialfinalizer{}), nil, nil, &memstats.other_sys) fixalloc_init (&h.specialprofilealloc, unsafe. Sizeof (specialprofile{}), nil, nil, &memstats.other_sys)
To better understand the allocator, let's take a look at how it's used. Each time we want to allocate a new Mspan, Mcache, Specialfinalizer, or specialprofile struct, the allocator can be called through the Fixalloc_alloc function. The main parts of this function are as follows:
If UIntPtr (F.nchunk) < f.size {F.chunk = (*uint8) (Persistentalloc (_fixallocchunk, 0, f.stat)) F.nchunk = _ Fixallocchunk}
It allocates a piece of memory, but it is not allocated by the actual size of the struct (f.size), but rather by leaving a space of _fixallocchunk (currently a size of KB). The extra storage space is stored in the allocator. The next time you allocate space for the same structure, you don't need to call time-consuming persistentcalloc operations.
The Persistentalloc function is used to allocate memory space that is not garbage collected. Its workflow is as follows:
- If the allocated block is larger than KB, it is allocated directly from OS memory.
- Otherwise, a permanent allocator (persistent allocator) is found.
- Each permanent allocator corresponds to a process. The main purpose is to use locks in the permanent allocator. Therefore, when we use a permanent allocator, we are using the permanent allocator of the current process.
- If information about the current process cannot be obtained, the global allocator is used.
- If the allocator does not have enough free memory, request more memory from the OS.
- Returns the requested size of memory from the allocator's cache.
The working mechanism of PERSISTENTALLOC and FIXALLOC_ALLOC functions is very similar. It can be said that these functions implement a level two caching mechanism. You should be aware that the Persitentalloc function is not only used in the Fixalloc_alloc function, but is used in many other places where permanent memory is used.
Let's go back to the Mheap_init function. One question that needs to be answered is what the four structures initialized at the beginning of a function are used for:
- Mspan is just a wrapper for the chunks of memory that should be garbage collected. When we discussed the memory size classification earlier, we discussed it. When you create an object of a specific size category, you create a mspan.
- Mcache are the structures associated with each process. It is responsible for caching the extension. Each outer process has an independent mcache primarily to avoid the use of locks.
- Specialfinalizeralloc is at runtime. Setfinalizer the struct that is allocated when the function is called, and this function is called when we want the system to execute some cleanup code at the end of the object. For example, OS. The NewFile function associates a finalizer for each new file. And this finalizer is responsible for shutting down the system's file descriptor.
- Specialprofilealloc is a struct used in the memory analyzer.
When the memory allocator is initialized, mheap_initfunction invokes the Mspanlist_init function to initialize the linked list. The process is very simple, and all of the initialization work done by it is simply initializing the entry node of the linked list. The MHEAP structure contains several such linked lists.
- The Mheap.free and mheap.busy arrays are used to store idle lists of large objects (large objects are objects that are larger than 1 MB and that are smaller than three KB). Each possible size has a corresponding entry in the array. Here, the size is measured in pages, and the size of each page is in kilobytes. That is, the first necklace table in the array manages the size of the memory block, the second item manages a memory block of KB, and so on.
- Mheap.freelarge and Mheap.busylarge are idle and busy lists of size 1 MB object space.
The next step is to initialize the Mheap.central, which manages all memory blocks that store small objects (less than one KB). In Mheap.central, the list is grouped according to the size of its managed memory block. The initialization process is very similar to what you saw earlier, and the initialization process simply initializes all the idle lists.
Initializing the cache
Now we have almost completed the initialization of all the memory allocators. The last thing left in the Mallocinit function is the initialization of Mcache:
_g_: = GETG () _g_.m.mcache = Allocmcache ()
First, get the current co-process. Each goroutine contains a pointer to the M struct. The structure wraps the operating system threads. The Mcache domain in this struct is initialized in these lines of code. The Allomcache function calls Fixalloc_alloc to initialize the new Mcache struct. We have discussed the allocation of the struct and its meaning.
The attentive reader may notice that I have previously said that each mcache is associated with a process, but we now say that it is associated with the M struct, and that the M struct is associated with an OS process, not a processor. This is not an error, Mcache is initialized only when the process is executing, and it is re-switched to another thread m struct after the process has switched.
More about the Go boot process
In the next blog, we will continue to discuss the initialization process of the garbage collector during startup and how the main goroutine is started. At the same time, welcome everyone actively in the blog comments.