cmd/compile
Contains the main package that makes up the Go compiler. The compiler can logically be divided into four phases, and we will briefly describe these phases and the list of packages that contain the corresponding code.
When it comes to compilers, it is sometimes possible to hear both the front-end front-end and back-end back-end terms. Roughly speaking, these correspond to the first two and the last two phases that we will list here. The third term intermediate-end middle-end usually refers to most of the work performed in the second phase.
Note that the go/parser
go/types
packages and other series are not related to go/*
the compiler. Since the compiler was originally written in C, these go/*
packages were developed to be able to write Go
tools that work with code, such as gofmt
and vet
.
To clarify, the name "GC" represents "go compiler go compiler", which is independent of the uppercase GC, which represents garbage collection garbage collection.
1. Analysis
cmd/compile/internal/syntax
( lexical parser lexer, parser parser, syntax tree syntaxtrees)
In the first stage of the compilation, the source code is tagged (lexical parsing), parsed (parsing), and a syntax tree is constructed for each source file (LCTT: Here token token, which is a set of predefined, recognizable strings, usually consisting of names and values, in which the names are generally lexical categories, such as identifiers, Keywords, separators, operators, text, and annotations, syntax trees, and abstract syntax trees, Syntaxtree (AST), refer to the syntax structure of a tree to express the programming language, usually the leaf node is the operand, Other nodes are opcode).
Each syntax tree is the exact representation of the corresponding source file, where the nodes correspond to various elements of the source file, such as expressions, declarations, and statements. The syntax tree also includes location information for error reporting and for creating debug information.
2. Type check and AST transform
cmd/compile/internal/gc
(Create compiler AST, type check type-checking,ast transform AST transformation)
The GC package contains a version of the AST definition that inherits from the (earlier) C language implementation. All code is written based on it, so the first thing a GC package must do is convert the syntax tree of the syntax package (definition) to the compiler's AST notation. This additional step may be refactored in the future.
The AST is then checked for type. The first step is name resolution and type inference, which determine which object belongs to which identifier, and what type each expression has. Type checking includes specific additional checks, such as "declared but not used," and determines whether the function terminates.
Specific transformations are also done based on the AST. Some nodes are refined based on type information, such as splitting the string addition from the node type of the arithmetic addition. Some other examples are dead code elimination, function calls, inline function call inlining , and escape analysis . Dead EscapeAnalysis (LCTT) is a method of analyzing the effective range of pointers.
3. General SSA
cmd/compile/internal/gc
(Convert to SSA)
cmd/compile/internal/ssa
(SSA-related links pass and rules)
(LCTT: Compilers in many common high-level languages cannot do all the compilation work by scanning the source code or AST at once, instead of scanning multiple times, completing part of the work each time, and outputting the output as input to the next scan until the target code is eventually generated.) Here each scan is called a link pass; The result of all the links before the last link is called the middle representation, and the AST, SSA and so on in this paper belong to the intermediate representation method. SSA, static single-assignment form, is a property of the intermediate notation, which requires that each variable be assigned only once and defined before it is used.
At this stage, the AST is converted to a static single-assignment static assignment(SSA) Form, which is a low-level intermediate representation with a specific attribute Intermediate Representationmakes it easier to optimize and ultimately generate machine code from it.
The processing of the built-in function functions intrinsics will be completed during this conversion process. These are special functions, and the compiler is told to parse the functions one by one and decide whether to replace them with deeply optimized code (LCTT: Built-in functions are functions defined by the language itself, usually the compiler is handled by using the instruction sequence of the corresponding implementation function instead of the calling instruction of the function, somewhat like an inline function).
In the process of converting an AST into an SSA, specific nodes are also reduced to simpler components, so that the remainder of the compilation phase can work on them. For example, the built-in copy is replaced with a memory move, and the range
loop is rewritten as a for
loop. For historical reasons, some of this is happening before the conversion to SSAS, but the long-term plan is to move them all here (conversion SSA).
Then, a series of machine-independent rules and compilation links are executed. These do not take into account a particular computer architecture, so the values for all GOARCH
variables will run.
Some examples of this generic compilation include dead code removal, removal of unnecessary null checks, and removal of useless branches. Universal rewrite rules primarily consider expressions, such as replacing some expressions with constants, optimizing multiplication and floating-point operations.
4. Generate Machine code
cmd/compile/internal/ssa
(SSA low-level and architecture-specific links)
cmd/internal/obj
(Machine code generation)
The machine-related phase of the compiler begins at the "low-level" compilation stage, which overwrites the general variables into their specific machine-code form. For example, the operands in the AMD64 schema can be manipulated in memory so that many load-store load-store operations can be merged.
Note that the low-level compilation runs all machine-specific rewrite rules, so currently it also applies a lot of optimizations.
Once the SSA is "degraded" and more specifically targeted to the target architecture, it is time to run the compilation of the final code optimization. This includes another part of the dead code elimination that moves variables closer to where they are used, removes local variables that have never been read, and registers register allocations.
Other important work done in this step includes the stack layout of the stacking framelayout, which assigns the stack offset position to local variables, and pointer activity analysis pointer liveness analyses , which calculates the pointers on which stacks on each garbage collection security point are still active.
At the end of the SSA generation phase, the Go function has been converted to a series of obj.Prog
instructions. They are passed to the assembler ( cmd/internal/obj
), which converts them to machine code and outputs the final target file. The destination file will also contain reflection data, export data, and debug information.
Extended Reading
To learn more about how the SSA package works, including its links and rules, go to cmd/compile/internal/ssa/readme.md.
Via:https://github.com/golang/go/blob/master/src/cmd/compile/readme.md
Author: Mvdan Translator: STEPHENXS proofreading: Pityonline, WXY
This article was compiled by LCTT original, Linux China honors launched