Golang Internals, part 3:the Linker, Object Files, and relocations

Source: Internet
Author: User
Tags what interface
This is a creation in Article, where the information may have evolved or changed.

Today, I'll speak about the GO linker, Go object files, and relocations.

Why should we care about these things? Well, if you want to learn the internals of any large project, the first thing your need to do are split it into components or modules. Second, need to understand what interface these modules provide In Go, these high-level modules is the compiler, linker, and runtime. The interface, the compiler provides and the linker consumes is a object file and that ' s where we'll start our INVe Stigation today.

Generating a Go object file

Let's do a practical experiment-write a super simple program, compile it, and see what object file would be produced. In my case, the program is as follows:

Package MainFunc Main () {print (1)}

Really straightforward, isn ' t it? Now we need to compile it:

Go Tool 6g Test.go

This command produces the test.6 object file. To investigate its internal structure, we is going to use the Goobj library. It is employed internally on Go source code, mainly for implementing a set of unit tests that verifies whether object file S is generated correctly in different situations. For the-this blog post, I wrote a very-a-prints the output generated from the Googj Library to the Console. You can take a look at the sources of the.

First of all, your need to download and install my program:

Go get github.com/s-matyukevich/goobj_explorer

Then execute the following command:

Goobj_explorer-o test.6

Now you should is able to see the Goob. package structure in your console.

Investigating the object file

The most interesting part of our object file is the syms array. This is actually a symbol table. Everything that's define in your program-functions, global variables, types, constants, Etc.-is written to this table. Let's look at the entry-corresponds to the main function. (Note that I has cut the Reloc and Func fields from the output for now.) We'll discuss them later.)

&goobj. sym{            symid:goobj. Symid{name: "Main.main", version:0},            Kind:  1,            dupok:false,            Size: $  ,            Type:  goobj. symid{},            Data:  goobj. data{offset:137, size:44},            Reloc: ...,            Func:  ...,}

The names of the fields in the goobj. Sum structure is pretty self-explanatory:

Field Description
Sumid The unique symbol ID that consists of the symbol ' s name and version. Versions help to differentiate symbols with identical names.
Kind Indicates to what kind the symbol belongs (more details later).
Dupok This field indicates whether duplicates (symbols with the same name) is allowed.
Size The size of the symbol data.
Type A reference to another symbol this represents a symbol type, if any.
Data Contains binary data. This field have different meanings for symbols of different kinds, e.g., assembly code for functions, raw string content fo R string symbols, etc.
Reloc The list of relocations (more details would be provided later)
Func Contains Special function metadata for function symbols (see more details below).

Now, let's look at different kinds of symbols. All possible kinds of symbols is defined as constants in the Goobj package (where can find them here). Below, I copied the first part of these constants:

Const (_ Symkind = iota//ReadOnly, executablestextselfrxsect//ReadOnly, Non-executablestypesstringsgostringsgofuncsrodatasfunctabstypelinkssymtab//Todo:move to unmapped Sectionspclntabselfrosect ...

As we can see, the main.main symbol belongs to kind 1 that corresponds to the stext constant. stext is a symbol, that contains executable code. Now, let's look at the Reloc array. It consists of the following structs:

Type Reloc struct {Offset intSize   intsym    symidadd    inttype int}

Each relocation implies this bytes situated at the [Offset, offset+size] interval should is replaced with a s Pecified address. This address was calculated by summing the location of the Sym symbol with the Add number of bytes.

Understanding Relocations

Now let's use a example and see how to relocations work. To does this, we need to compile our program using the- s switch that would print the generated assembly code:

Go Tool 6g-s Test.go

Let's look through the assembler and try to find the main function.

"". Main t=1 size=48 value=0 args=0x0 locals=0x80x0000 00000 (test.go:3) TEXT "". Main+0 (SB), $8-00x0000 00000 (Test.go:3) MOVQ (TLS), cx0x0009 00009 (test.go:3) cmpqsp,16 (CX) 0x000d 00013 (test.go:3) jhi,220x000f 00015 (test.go:3) Call, Runtime.morestack_noctxt (SB) 0x0014 00020 (test.go:3) jmp,00x0016 00022 (test.go:3) subq$8,sp0x001a 00026 (Test.go:3) Funcdata$0,gclocals 3280bececceccd33cb74587feedb1f9f+0 (SB) 0x001a 00026 (test.go:3) funcdata$1,gclocals 3280bececceccd33cb74587feedb1f9f+0 (SB) 0x001a 00026 (Test.go:4) movq$1, (SP) 0x0022 00034 (test.go:4) pcdata$0,$00x0022 00034 (Test.go:4) call,runtime.printint (SB) 0x0027 00039 (test.go:5) addq$8,sp0x002b 00043 (test.go:5) RET,

In the later blog posts, we'll have a closer look at this code and try to understand how the Go runtime works. For now, we is interested in the following line:

0x0022 00034 (test.go:4) call,runtime.printint (SB)

This command was located at a offset of 0x0022 (in hex) or 00034 (decimal) within the function data. This was actually responsible for calling the runtime.printint function. The issue is, the compiler does not know the exact address of the runtime.printint function during Compilatio N. This function was located in a different object file, the compiler knows nothing on. In such cases, it uses relocations. Below is the exact relocation, corresponds to this method, call (I copied it from the first output of the Goobj_exp Lorer Utility):

{                    offset:35,                    Size:   4,                    Sym:    goobj. Symid{name: "Runtime.printint", version:0},                    Add:    0,                    Type:   3,                },

This relocation tells the linker so, starting from an offset of bytes, it needs to replace 4 bytes of data with the A Ddress of the starting point of the runtime.printint symbol. But a offset of bytes from the main function data are actually an argument of the call instruction that we have Previou Sly seen. (The instruction starts from an offset of bytes. One byte corresponds to call instruction code and four bytes-to the address of this instruction.)

How the linker operates

Now so we understand this and we can figure out how the linker works. The following schema is very simplified and it reflects the main idea:

    • The linker gathers all the symbols from all the packages that is referenced from the main package and loads them into one Big byte array (or a binary image).
    • For each symbol, the linker calculates a address in this image.
    • Then it applies the relocations defined for every symbol. It's easy now, since the linker knows the exact addresses of any other symbols referenced from those relocations.
    • The linker prepares all the headers necessary for the executable and linkable (ELF) format (on Linux) or the portable Exec Utable (PE) format (on Windows). Then, it generates a executable file with the results.

Understanding TLS

A careful reader would notice a strange relocation in the output of the Goobj_explorer utility for the main method . It doesn ' t correspond to any method, call, and even points to an empty symbol:

{                    offset:5,                    Size:   4,                    Sym:    goobj. symid{},                    Add:    0,                    Type:   9,                },

So, what does this relocation do? We can see the It has a offset of 5 bytes and its size is 4 bytes. At this offset, the there is a command:

0x0000 00000 (Test.go:3) movq (TLS), CX

It starts at a offset of 0 and occupies 9 bytes (since the next command starts at an offset of 9 bytes). We can guess that this relocation replaces the strange(TLS)Statement with some address, but what's TLS and what's address does it use?

TLS is a abbreviation for Thread Local Storage. This technology are used in many programming languages (more details here). In short, it enables us to having a variable that points to different memory locations when used by different threads.

In Go, TLS are used to store a pointer to the G structure that contains internal details of a particular Go routine (more D Etails on the later blog posts). So, there are a variable that-when accessed from different Go routines-always points to a structure with internal details o F this Go routine. The location of this variable are known to the linker and this variable are exactly what were moved to the CX register in the Previous command. TLS can is implemented differently for different architectures. For AMD64, TLS is implemented via theFSRegister, so we previous command is translated intomovq FS,CX.

To end we discussion on relocations, I am going to show you the enumerated type (enum) that contains all the different Ty PES of relocations:

RELOC.TYPEENUM{R_ADDR = 1,r_size,r_call,//relocation for Direct pc-relative callr_callarm,//relocation for ARM dire  CT callr_callind,//marker for indirect call (no actual relocating necessary) R_const,r_pcrel,r_tls,r_tls_le,//TLS Local exec offset from TLS segment Registerr_tls_ie,//TLS initial exec offset from TLS base pointerr_gotoff,r_plt0,r_plt1,r_p Lt2,r_usefield,};

As you can see from the This enum, relocation type 3 was r_call and relocation type 9 is r_tls. These enum names perfectly explain the behaviour that we discussed previously.

More on Go object files

In the next post, we'll continue our discussion on object files. I'll also provide more information necessary for your move forward and understand how the Go runtime works. If you had any questions, feel free to ask them in the comments.

Read all parts of the Series:part 1 | Part 2 | Part 3 | Part 4 | Part 5

About the Author: Sergey Matyukevich is a Cloud Engineer and Go Developer at Altoros. With 6+ years in software engineering, he's an expert on cloud automation and designing architectures for complex cloud-b ased systems. An active member of the Go community, Sergey are a frequent contributor to Open-source projects, such as Ubuntu and Juju Ch Arms.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.