This is a creation in Article, where the information may have evolved or changed. This article by Bole Online-yhx translation, Tang Yuhua School Draft. without permission, no reprint!
English Source: Sergey Matyukevich. Welcome to join the translation team.
This series of blogs is intended for those who have a certain understanding of the basics of Go and who want to explore the details of their interiors in greater depth. Today this article mainly analyzes the basic structure of Go source code and some internal details of the go compiler. After reading this blog, you will get answers to the following three questions:
1. What is the Go source code structure like?
2. How does the Go compiler work?
3. What is the basic structure of the node tree in the Go language?
Let's get started.
Every time you start learning a new programming language, you can always find lots of "Hello World" tutorials, new finger South, or documents about the main concepts, syntax, and even standard libraries of the language. However, when you are looking for some more in-depth information, such as the layout of the data structures allocated by the language runtime in memory, or what kind of assembly code is generated when invoking a built-in function, you will find that this is not an easy thing to do. Obviously, the answers to these questions are hidden in the source code. However, in my personal experience, you are likely to spend hours groping through the source code and ultimately nothing.
I'm not going to pretend I know everything, and I'm not going to introduce everything. But hopefully it will help you to explore the source code of the Go language.
Before we get started, we need to have a copy of the Go source code ourselves. To get its source code very easy, just execute the following code:
git clone https://github.com/golang/go
Please note that the main branch of this code is constantly improving, and I use this branch of release-brach.go1.4 in this blog.
Figuring out the project structure
If you look at the/src folder in the Go repository, you will see a lot of folders. Most of these folders are the source files for the Go standard library. The project uses standard naming conventions, so each package (Pakage) is in a separate folder, and the name of the folder is the same as the package name. In addition to the standard library, there are many other things in the directory. In my view, among the most useful documents are:
Folder |
Describe |
/src/cmd/ |
Contains different command-line tools. |
/src/cmd/go/ |
This directory contains the source code file for a Go tool. This tool is used to download the source files for compiling go and to install the Go language package. In doing so, it collects all the source files and calls the GO linker and compiler. |
/src/cmd/dist/ |
This directory also contains a tool. This tool is used to compile and build all other command-line tools. At the same time, it will generate all the packages from the standard library. To figure out which libraries each tool or package is using, you need to analyze the source code here. |
/src/cmd/gc/ |
Contains part of the Go compiler that is not related to system architecture. |
/src/cmd/ld/ |
Contains parts of the GO linker that are not related to system architecture. The parts related to the system architecture are placed in a directory that begins with L. The naming conventions for these directories are the same as those in the compiler section. |
/src/cmd/5a/, 6a, 8a, and 9a |
This directory contains the Go language assembler compiler for different schemas. The language of the Go assembler does not map to the assembly language of the lower machine in a corresponding way. However, for each of the different architectures there is a compiler that translates the Go assembler into a machine assembler. You can find out more here. |
/src/lib9/,/src/libbio,/src/liblink |
The different libraries used in the compiler, linker, and runtime. |
/src/runtime/ |
This section contains the most important packages for the Go language, and all programs import them by default. This includes all the runtime functions, such as memory management, garbage collection, Go goroutine, and so on. |
Go compiler internal mechanism
As mentioned, the part of the Go compiler that is unrelated to the system structure is placed in the/SRC/CMD/GC directory. Its entry point is in the lex.c file. In addition to some common parts, such as command-line argument parsing, the compiler will do the following:
1. Initialize some of the common data structures.
2. Iterate through all the supplied Go source code files and call the Yyparse method for each file. This method will complete the real parsing. The Go compiler uses Bison as the program Analysis generator. The syntax description is stored in the file Go.y (I'll provide detailed instructions later). Eventually, this step generates a complete parse tree, where each node represents an element of the post-compilation program.
3. Recursively traverse the resulting tree and make certain modifications, such as specifying type information for the nodes that should be implicitly defined, overriding some language elements passed to the function in the run-time package- such as type conversions, among other things.
4. Once the parsing tree is processed, the actual compilation is performed, and the node is translated into assembly code.
5. Create the target file on disk and write the compiled code of the translation and some additional data structures, such as symbol tables, to the target file.
Dive into Go language grammar
Now let's go further. The Go.y file contains the grammar rules of the language, so this file is a great breakthrough for exploring the Go compiler and the key to understanding the rules of language grammar. This document mainly includes the following parts:
XFNDCL: lfunc fndcl FNBODYFNDCL: sym ' (' Oarg_type_list_ocomma ') ' fnres| ' (' Oarg_type_list_ocomma ') ' sym ' (' Oarg_type_list_ocomma ') ' Fnres
This statement defines the XFNDCL and the FNDCL two nodes. FNDCL nodes can be in the following two forms. The first corresponds to the following syntax structure:
somefunction (x int, y int) int
The second form corresponds to the following syntax structure:
(t *sometype) somefunction (x int, y int) int.
The XFNDCL node contains the keyword func stored in the constant Lfunc, followed by the FNDCL and Fnbody nodes.
Bison (or YACC) syntax a 10 important feature is that it allows any C code to be placed after the node definition. The corresponding C code executes whenever a portion of the source code file that matches the definition of the node is found. Here, we define the final result node as $$
its sub-nodes are $1,$2 ...
It is easier to understand through an example. Note the following simplified code:
FNDCL: sym ' (' Oarg_type_list_ocomma ') ' fnres { t = nod (Otfunc, n, N); T->list = $ $; T->rlist = $ $; $$ = Nod (Odclfunc, n, N); $$->nname = NewName ($); $$->nname->ntype = t; Declare ($$->nname, PFUNC); }| ' (' Oarg_type_list_ocomma ') ' sym ' (' Oarg_type_list_ocomma ') ' Fnres
First, we create a new node that contains the type information for the function declaration. At the same time, the parameter list of this node refers to the end point of $ $, and the result list refers to the settlement. The result node is then created $$
. The name of the function and its type node are stored in the result node. As you can see, there may not be a direct correspondence between the definition in the Go.y file and the node structure.
How to understand the knot point
It's time to look at what the knot is. First, the node is a struct (you can find its definition here). This structure contains a large number of attributes, because it requires different types of node types, and different classes of nodes have different properties. Here are some of the things I think are more important:
Node Structure body field |
Describe |
Op |
The node operator. Each node has this field. It distinguishes between different types of nodes. In the previous example, the field is otfunc (Operation type function) and Odclfunc (Action Declaration function). |
Type |
The field refers to a struct that contains type information (some nodes have no type information, such as control flow statements like if, switch, for, and so on). |
Val |
In a node that represents a constant, the field stores the constant value. |
So far, you have understood the basic structure of the tree, and you can use that knowledge. In the next blog post, we will use a simple go application as an example to analyze how the go compiler compiles the code.