This is a creation in Article, where the information may have evolved or changed.
Go in Go
With the Go 1.5 release, the entire system is now written using go (with a small compilation)
C has become a past time.
Note: gccgo still very powerful
This article focuses on the original compiler GCC
Why was it written in C before?
Start
(and the main purpose of Go is not to implement the language as a compiler)
Why does the compiler use Go rewrite?
More than just validation, we have more practical reasons:
Go is easier to write than C (actually)
Go is easier to debug than C (even without a debugger)
Go will be the only language you will need to encourage your contribution
Go has better modularity, toolchain, test tools, configuration tools, etc.
Go is simple enough to do parallel operations
Although it seems a lot of advantages, but still boast too early:)
Design Document: Golang.org/s/go13compiler
Why is the runtime also rewritten with Go?
We have our own C compiler to compile the runtime
We need a compiler like the Go ABI, like segmented stacks.
Using Go writing can get rid of the C compiler's disadvantage.
This is more important than using Go to rewrite the compiler.
(All the reasons for using go to rewrite the compiler can be used as a justification for running with Go rewrite)
Restrict runtime to be written in only one language, easier to integrate, manage stacks, and more.
As always, simplification is the primary consideration.
History
Why should we have a tool chain that belongs to us all?
Our ABI?
Our own file format?
History, familiarity, and ease of moving forward. and speed.
Most of the major improvements to Go are more difficult than GCC or LLVM.
news.ycombinator.com/item?id=8817990
Significant improvements
Thanks to the simplified features of the tool and some improvements after using Go rewriting:
linker re-construction
New garbage collector
Stacked graphs
Continuous stack
Write barrier
The last three will not be implemented in C:
(GccgoSoon there will be segmented stacks and imprecise (stack) collection)
Goroutine Stack
Until 1.2:stacks were segmented.
1.3:stacks were contiguous unless executing C code (runtime).
1.4:stacks made contiguous by the restricting C to system stack.
1.5:stacks made contiguous by eliminating C.
These were each huge steps, made quickly (led by khr@ ).
Converting the runtime
Mostly done by hand with machine assistance.
Challenge to implement the runtime in a safe language.
Some use unsafe of to deal with pointers as raw bits in the GC, for instance.
But less than you might think.
The translator (next sections) helped for some of the translation.
Converting the compiler
Why translate it, not write it from scratch? Correctness, testing.
Steps:
Write a custom translator from C to Go.
Run the translator, iterate until success.
Measure success by bit-identical output.
Clean up the code by hand.
Turn it from c-in-go to idiomatic Go (still happening).
Translator
First output is C Line-by-line translated to (bad!) Go.
Tool to does this written by rsc@ (talked on Gophercon 2014).
Custom written for the job, not a general c-to-go translator.
Steps:
Parse C code using new simple C parser ( yacc )
Remove or rewrite c-isms such as an *p++ expression
Walk the C parse tree, print the C code in GO syntax
Compile the output
Run, compare generated code
Repeat
The Yacc Grammar was translated by sam-powered hands.
Translator Configuration
Aided by Hand-written rewrite rules, such as:
Also diff-like rewrites for things such as using the standard library:
diff {-G.rpo = obj. Calloc (G.num*sizeof (G.rpo[0]), 1). ([]*flow)-idom = obj. Calloc (G.num*sizeof (Idom[0]), 1). ([]int32)-if G.rpo = = Nil | | Idom = = Nil {-Fatal ("Out of Memory")-}+ G.rpo = Do ([]*flow, g.num) + Idom = make ([]int32, G.num)}
Another example
This one due to semantic difference between the languages.
diff {-If Nreg = = +-mask = ^0//can ' t rely on C to shift by 64-} else {-mask = (1 << uin T (Nreg))-1-}+ mask = (1 << uint (NREG))-1}
Grind
Once in Go, new tool grind deployed (by rsc@ ):
Parses Go, type checks
Records a list of edits to perform: "Insert the text at this position"
At end, the applies edits to source (hard to edit AST).
Changes guided by profiling and other analysis:
Removes dead code
Removes GOTOs
Removes unused labels, needless indirections, etc.
Moves var declarations nearer to first use
Rsc.io/grind
Performance problems
Output from translator is poor Go, and ran about 10X slower.
Most of this slowdown has been recovered.
Problems with C to Go:
C patterns can be poor Go; e.g.: Complex for Loops
C stack variables never escape; Go compiler isn ' t as sure
Interfaces such as fmt.Stringer vs. C ' svarargs
No unions in Go, so use structs instead:bloat
Variable declarations in wrong place
C compiler didn ' t free much memory, but Go have a GC.
Adds CPU and memory overhead.
Performance fixes
profile! (Never done before!)
Move closer to first use vars
Split into vars multiple
Replace code in the compiler with code in the library:e.g.math/big
Use interface or other tricks to combine fields struct
Better Escape Analysis ( drchase@ ).
Hand tuning code and data layout
Use tools like grind , and for much of this gofmt -r eg .
Removing interface argument from a debugging print library got 15% overall!
More remains to is done.
Technical Benefits
Other benefits of the conversion:
Garbage collection means no more worry about introducing a dangling pointer.
Chance to the back ends.
Unified 386 and amd64 architectures throughout the tool chain.
New architectures is easier to add.
Unified the Tools:now one compiler, one assembler, one linker.
Compiler
GOOS=YYYGOARCH=XXX go tool compile
one compiler; No more 6g , 8g etc.
About 50K lines of portable code.
Even the registerizer is portable now; Architectures well characterized.
Non-portable:peepholing, details like registers bound to instructions.
Typically around 10% of the portable LOC.
Assembler
GOOS=YYYGOARCH=XXX go tool asm
New assembler, all on Go, written from scratch r@ .
Clean, idiomatic Go code.
Less than 4000 lines, <10% machine-dependent.
Almost completely compatible with previous and yacc C assemblers.
How are this possible?
Shared syntax originating in the Plan 9 assemblers
Unified Back-end logic (old liblink , now internal/obj )
Linker
GOOS=YYYGOARCH=XXX go tool link
Mostly Hand-and machine-translated from C code.
New Library, internal/obj , part of the original linker, captures details about machines, writes object files.
27000 lines summed across 4 architectures, mostly tables (plus some ugliness).
Example benefit:one Print routine to print any instruction for any architecture.
Start
No C compiler required, just a Go compiler
therefore need to download from 1.5 source code to install build go
We use Go 1.4+ as the base library to build the 1.5+ toolchain
Details: Golang.org/s/go15bootstrap
Future
There's still a lot of work to be done in the future, but 1.5 is almost done.
Plans for the future:
Better Escape analysis
The new Compiler backend uses SSAS (using Go will be much simpler than C).
More optimizations
Generating machine descriptions from PDFs (or XML)
There will be a pure machine generation Directive definition
"Read in from PDF, write a compilation configuration"
Disassembler already deployed
Summarize
Getting rid of C is a huge improvement to the GO project, with cleaner code, improved testability, deployment, and easier operation.
The new unified toolchain reduces the number of code and improves maintainability.
Flexible tool chains are also important for portability.
Thank
Rob Pike
Google
r@golang.org
http://golang.org/
Via talks.golang.org