This is a creation in Article, where the information may have evolved or changed.
This series of blogs posts is intended for those who be already familiar with the basics of Go and would like to get a Dee Per insight into its internals. Today's post is dedicated to the structure of Go source code and some internal details of the go compiler. After reading this, you should is able to answer the following questions:
1. What is the structure of Go source code?
2. How does does the Go compiler work?
3. What's the basic structure of a node tree in Go?
Getting started
When you start learning a new programming language, you can usually find a lot of "hello-world" tutorials, beginner Guides , and books with details on main language concepts, syntax, and even the standard library. However, getting information on such things as the layout of major data structures that the language runtime allocates or What assembly code was generated when you call built-in function was not so easy. Obviously, the answers lie inside the source code, but, from my own experience, can spend hours wandering through it w Ithout making much progress.
I'll not pretend to be a expert on the topic, nor would I attempt to describe every possible aspect. Instead, the goal is to demonstrate how can decipher Go sources on your own.
Before we can begin, we certainly need our own copy of Go source files. There is nothing special in getting them. Just Execute:
git clone https://github.com/golang/go
Please note this code in the main branch are being constantly changed, so I use the release-branch.go1.4 Branc h in the This blog post.
Understanding Project Structure
If you are in the /src folder of the Go repository, you can see a lot of folders. Most of them contain source files of the standard Go library. The standard naming conventions was always applied here, so each package is inside a folder with a name that directly Corr Esponds to the package name. Apart from the standard library, there are a lot of other stuff. In my opinion, the most important and useful folders is:
Folder |
Description |
/src/cmd/ |
Contains different command line tools. |
/src/cmd/go/ |
Contains source files of a go tool that downloads and builds go source files and installs packages. While doing this, it collects all source files and makes calls to the GO linker and go compiler command line tools. |
/src/cmd/dist/ |
Contains a tool responsible for building all other command line tools and all the packages from the standard library. Want to analyze It source code to understand what libraries is used in every particular tool or package. |
/src/cmd/gc/ |
This was the architecture-independent part of the Go compiler. |
/src/cmd/ld/ |
The architecture-independent part of the GO linker. Architecture-dependent Parts is located in the folder with the ' L ' postfix that uses the same naming conventions as the C Ompiler. |
/src/cmd/5a/, 6a, 8a, and 9a |
Here's can find Go assembler compilers for different architectures. The Go assembler is a form of assembly language this does not map precisely to the assembler of the underlying machine. Instead, there is a distinct compiler for each architecture this translates the Go assembler to the machine ' s assembler. You can find more details here. |
/src/lib9/,/src/libbio,/src/liblink |
Different libraries that is used inside the compiler, linker, and runtime package. |
/src/runtime/ |
The most important Go package, which is indirectly included to all programs. It contains the entire runtime functionality, such as memory management, garbage collection, Goroutines creation, etc. |
Inside the Go compiler
As I said above, the architecture-independent part of the "Go compiler is located" in The/src/cmd/gc/folder. The entry point was located in the lex.c file. Apart from some common stuff, such as parsing command line arguments, the compiler does the following:
Initializes some common data structures.
Iterates through all of the provided Go files and calls the Yyparse method for each file. This causes actual parsing to occur. The Go compiler uses Bison as the parser generator. The grammar for the language are fully described in the Go.y file (I'll provide more details on it later). As a result, this step generates a complete the parse tree where each node represents an element of the compiled program.
Recursively iterates through the generated tree several times and applies some modifications, e.g., defines type Informati On for the nodes, should is implicitly typed, rewrites some language Elements-such as Typecasting-into calls to some F Unctions in the runtime package and does some.
Performs the actual compilation after the parse tree was complete. Nodes is translated into assembler code.
Creates the object file that contains generated assembly code with some additional data structures, such as the symbols TA BLE, which is generated and written to the disk.
Diving into Go grammar
Now lets take a closer look at the second step. The Go.y file, contains language grammar is a good starting point for investigating the go compiler and the key to und Erstanding the language syntax. The main part of this file consists of declarations, similar to the following:
Xfndcl:lfunc fndcl fnbodyfndcl:sym ' (' Oarg_type_list_ocomma ') ' fnres| ' (' Oarg_type_list_ocomma ') ' sym ' (' Oarg_type_list_ocomma ') ' Fnres
In this declaration, the XFNDCL and FUNDCL nodes is defined. The FUNDCL node can is in one of the forms. The first form corresponds to the following language construct:
somefunction (x int, y int) int
And the second one to this language construct:
(t *sometype) somefunction (x int, y int) int.
The XFNDCL node consists of the keyword func that's stored in the constant Lfunc, followed by The FNDCL and fnbodynodes.
An important feature of Bison (or YACC) grammar are that it allows for placing arbitrary C code next to each node Definitio N. The code is executed every time a match for this node definition was found in the source code. Here, you can refer to the result of node as $$ and to the child nodes as $ , $ ...
It is easier to understand this through an example. Note that the following code is a shortcut version of the actual code.
Fndcl:sym ' (' Oarg_type_list_ocomma ') ' fnres {t = nod (Otfunc, n, N); T->list = $ $; T->rlist = $ $; $$ = Nod (Odclfunc, n, N); $$->nname = NewName ($); $$->nname->ntype = t; Declare ($$->nname, PFUNC); }| ' (' Oarg_type_list_ocomma ') ' sym ' (' Oarg_type_list_ocomma ') ' Fnres
First, a new node is created, which contains type information for the function declaration. The $ argument list and the $ result list is referenced from this node. Then, the $$ result node is created. It stores the function name and the type node. As can see, there can is no direct correspondence between definitions in the go.y file and the node structure .
Understanding Nodes
Now it's time to take a look at what a node actually is. First of all, a node was a struct (you can find a definition here). This struct contains a large number of properties, since it needs to support different kinds of nodes and different nodes have different attributes. Below is a description of several fields that I think be important to understand.
Node struct field |
Description |
op |
Node operatio N. Each node have this field. It distinguishes different kinds of nodes from the other. In our previous example, those were otfunc (operation type function) and odclfunc (Operation declaration function). |
Type |
This is a reference to another struct with type information for nodes that has type Information (there is no types for some nodes, e.g., control flow statements, such as if , switch , or for ). |
Val |
This field contains the actual values for nodes that represent literals. |
Now so you understand the basic structure of the node tree and you can put your knowledge into practice. In the next post, we'll investigate what exactly the go compiler generates, using a simple Go application as an example.
Read all parts of the series: part 1 | Part 2 | Part 3 | Part 4 | Part 5
About the Author: Sergey Matyukevich is a Cloud Engineer and Go Developer at Altoros. With 6+ years in software engineering, he's an expert on cloud automation and designing architectures for complex cloud-b ased systems. An active member of the Go community, Sergey are a frequent contributor to Open-source projects, such as Ubuntu and Juju Ch Arms.