"In-depth understanding of computer Systems" chapter seventh links

Last Update:2016-04-09 Source: Internet

Author: User

Tags new set table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"In-depth understanding of computer Systems" chapter seventh links

A link is the process of collecting and combining various pieces of code and data into a single file that can be loaded (the goods are copied) to the memory and executed.

The timing of the link

At compile time, that is, when the source code is translated into machine code
Load, that is, when the program is loaded into storage and executed by the loader
Run-time by application execution

The linker makes the detach compilation known as possible.

A link is the process of collecting and combining various pieces of code and data into a single file that can be loaded (or copied) into memory and executed.

Links can be executed at compile time, that is, when the source code is translated into machine code, or when the facilitates is executed, that is, when the loader is loaded into storage and executed, or even executed at run time by the application.

In an earlier computer system, the links were executed manually. In a modern system, a link is automatically executed by the called Linker.

7.1 Compiler Driver

Most compiled systems provide a compiler driver, which represents the language preprocessor, compiler, assembler, and linker that the user invokes when needed.

Example: the function main () calls swap to swap the two elements in the external global data buf. This example runs through the full text and analyzes how links work.

123456789101112 /* $begin main *//* main.c */voidswap(); int buf[2] = {1, 2}; int main(){ swap(); return0;}/* $end main */

1234567891011121314151617 /* $begin swap *//* swap.c */externintbuf[];int*bufp0 = &buf[0];int*bufp1;voidswap(){ int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp;}/* $end swap */

7.2 Static Links

The static linker takes a set of relocatable target files and command-line arguments as input, generating a fully-linked executable target file that can be loaded and run as output. The relocatable destination file that you enter is made up of a variety of different code and data sections (section). Directives in one section, initialized global variables in another section, and uninitialized variables in another section.

为了构造可执行文件，链接器必须完成两个任务：符号解析，重定位

The symbol resolves the target file definition and reference symbol. The purpose of symbolic parsing is to associate each symbol reference with exactly one symbol definition.
The relocation compiler and assembler generate hungry code and data sections starting at address 0. The linker repositions These sections by linking each symbol definition to a memory location and then modifying all references to those symbols so that they point to the memory location.

Some basic facts about the linker: The destination file is purely a collection of byte blocks. Some of these blocks contain program code, some contain program data, and others contain data structures that guide the linker and the loader. The linker connects the blocks, determines the run-time position of the connected block, and modifies various locations in the code and data blocks. The linker and assembler have done most of the work.

The destination file is purely a fast-byte collection. Some of these blocks contain program code, others contain program data, while others include data structures that guide the linker and the loader. The linker links the blocks, determines the run-time position of the connected block, and modifies various locations in the code and data blocks. The linker knows very little about the target machine. The compiler and assembler that produced the target file have done most of the work.

7.3 Target files 1. Three Forms

The destination file can be relocated. Contains binary code and data in a form that can be combined with other relocatable target files at compile time to create an executable target file.
Executable target file. Contains binary code and data in the form that can be copied directly to the memory and executed.
Share the destination file. A special type of relocatable target file that can be dynamically loaded into storage and linked at load or run-by.

The compiler and assembler generate a relocatable target file (including a shared destination file). The linker generates executable target files. Technically, a target module is a byte sequence, and a target file is a target module that is stored in a disk file.

The compiler and assembler generate a redefinition target file (including a shared destination file). The linker generates executable target files.

The destination file formats are not the same between each system.

7.4 to relocate the target file

A typical elf can relocate the format of the target file P451. The Elf header (elf header) begins with a 16-byte sequence that describes the size and byte order of the word of the system that generated the file. The remainder of the ELF header contains information that helps the linker parse and interpret the target file. These include the size of the elf head, the type of the target file (such as relocatable, executable, or shared), machine type (such as IA32), file offset of the section Header table, and the size and number of entries in the section Header table. The location and size of the different sections are described by the section Header table, where each section in the destination file has a fixed-size entry.

Both the Elf Head and the section Head table are borrowed. A typical elf relocatable target file contains the following sections:

. text The machine code of the compiled program
. Rodata read-only data
. Data has been initialized for global C variables. Local c variables are stored in the stack at run time, and are not present in the. Data section, nor in the. BSS section.
. Bass uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder. The destination file format distinguishes between initialization and uninitialized variables for space efficiency: in the target file, uninitialized variables do not need to occupy any actual disk space.
. symtab a symbol table that holds information about functions and global variables that are defined and referenced in the program. Each relocatable destination file has a symbol table in. Symtab.
. rel.text A. The list of locations in the text section, which needs to be modified when the linker is combining this target file with other files. In general, any instruction that calls an external function or references a global variable needs to be modified. On the other hand, directives that invoke local functions do not need to be modified. Note that it is not necessary to relocate the information in the executable target file, so it is usually omitted unless the user displays the section indicating that the linker contains this information.
. Rel.data the relocation information for any global variables referenced by or defined by the module. Generally, any initialized global variable needs to be modified if its initial value is a global variable address or an address of an externally defined function.
. debug a Debug symbol table whose entries are local variables and type definitions that are always defined by the program, global variables defined and referenced in the program, and the original C source file.
The line number in the original C source file and the mapping between the machine directives in the. Text section.
. strtab A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header.

7.5 Symbols and Symbols table 每个可重定位目标模块m都有一个符号表，包含m所定义和引用的符号的信息。 在链接器的上下文中，三种不同的符号：

1.由m定义并能被其他模块引用的全局符号。全局链接器对应于非静态的C函数以及被定义为Cstatic 属性的全局变量。

2.由其他模块定义并被模块m以引用的全局符号——外部符号，对应于定义在其他模块中的C函数和变量

3.只被模块m定义和引用的本地符号。

In the context of the linker, there are three different symbols:

Global symbols defined by M and can be referenced by other modules
Global symbols defined by other modules and referenced by the module M
Local symbols referenced only by module M

7.6 Symbolic resolution 7.6.1 how the linker resolves global symbols for multiple definitions

In the compilation, the compiler outputs each global symbol to the assembler, either strong or weak, and the assembler implicitly encodes the information in the symbol table of the relocatable target file. The function and the initialized global variable are strongly signed, and the uninitialized global variable is a weak symbol.

Based on the definition of strong and weak symbols, the UNIX linker uses the following rules to handle symbols for multiple definitions:

Rule 1: Multiple strong symbols are not allowed.
Rule 2: If you have a strong symbol and multiple weak symbols, select the strong symbol.
Rule 3: If there are multiple weak symbols, select one of these weak symbols.

7.6.2 and Static Library links

In UNIX systems, the static library is in a special file format called Archive village where on disk. An archive file is a set of connected, relocatable target files that have a header that describes the size and location of each member's target file. The archive file name is identified by the suffix. A.

7.6.3 how the linker uses static libraries to parse references

During the symbolic parsing phase, the linker scans the relocatable destination and archive files in the same order that they appear on the compiler driver command line, from left to right. In this scan, the linker maintains a set of relocatable target files E (the files in this collection are merged to form an executable file), an unresolved symbol (that is, a symbol that references but not yet defined), and a set of symbols defined in the previous input file, D. Initially, E, U, and D are empty.

1. For each input file f on the command line, the linker determines whether F is a target file or an archive file. If f is a target file, then the linker f is added to E, modify U and D to reflect the symbol definitions and references in F, and continue with the next input file.

2. If f is an archive file, the linker attempts to match the unresolved symbols in U and the symbols defined by the archive file members. If an archive member, M, defines a symbol to resolve a reference in U, the M is added to E, and the linker modifies U and D to reflect the symbol definitions and references in M. This process is repeated for all the member target files in the archive file until both U and D are no longer changed. At this point, any target file that is not contained in E is simply discarded, and the linker continues to process the next input file.

3. If u is non-null after the linker finishes scanning the input file on the command line, then the linker can output an error and terminate it. Otherwise, it merges and repositions the destination file in E to build the output executable file.

This algorithm causes some disturbing link-time errors, because the order of the libraries and destination files on the command line is very important. On the command line, if the library that defines a symbol appears before the target file referencing the symbol, then the reference cannot be parsed and the link fails. The general guideline for libraries is to put them at the end of the command line.

On the other hand, if the libraries are not independent of each other, then they must be sorted so that the symbol s, which is referenced externally to each member of the archive file, has at least one of the s's definitions in the command line after the reference to S.

If you need to meet dependent requirements, you can repeat the library on the command line.

7.7 Relocation

Once the linker has completed the symbolic parsing step, it links each symbol in the code to a symbolic definition (that is, a symbol table entry in one of its input target modules). At this point, the linker knows the exact size of the Code section and Data section in its input target module. You are now ready to reposition, and in this step you will merge the input modules and assign a run-time address to each symbol.

Relocation is made up of two steps:

1. Reposition section and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregation section of the same type. The linker then assigns the run-time memory address to the new aggregation section, assigns each section defined by the input module, and assigns each symbol to the input module definition. When this step is complete, each instruction and global variable in the program has a unique runtime memory address.

2. Reposition the symbol reference in the section. In this step, the linker modifies the reference to each symbol in the Code section and data section so that they point to the correct run-time address. To perform this step, the linker relies on the data structure in the Relocatable target module called the relocation entry .

7.7.1 Relocation Entries

When the assembler generates a target module, it does not know where the data and code are ultimately stored in memory. It also does not know the location of any externally defined functions or global variables referenced by this module. Therefore, whenever the assembler encounters a target reference to the final position location, it generates a relocation entry that tells the linker how to modify the reference when it merges the target file into an executable file. The relocation entry for the code is placed in the. Rel.text. The relocation entry for the initialized data is placed in the. Rel.data.

The ELF defines 11 different relocation types. We only care about two of the most basic relocation types:

R_386_PC32 relocate a reference to a relative address using a 32-bit PC.
R_386_32 relocate a reference that uses a 32-bit absolute address.

7.7.2 reposition Symbol Reference 7.8 executable target file

The format of the executable destination file is similar to the format of the relocatable destination file. The overall format of the ELF header description file. It also includes the entry point of the program, which is the address of the first instruction to be executed when the program is run: the text,. Rodata, and. Data sections and the sections in the Relocatable destination file are similar except that these sections have been relocated beyond their final run-time memory addresses: The init section defines a small function called _init, which is called by the program's initialization code. Because the executable file is fully linked (it has been relocated), it no longer requires the. Rel section.

Elf executables are designed to be easily loaded into memory, and successive slices of an executable file are mapped to contiguous memory segments. The section Header table describes this mapping relationship.

7.9 Loading executable target files

To run the executable target file p, you can enter its name on the command line of the Unix shell:

1	`unix> ./p`

Because P is not a built-in shell command, the shell will assume that P is an executable target file and run it by calling an operating system code that resides in memory called the loader (loader). Any UNIX program can call the loader by calling the EXECVE function. The loader copies the code and data from the disk into storage in the executable destination file, and then runs the program by jumping to the first instruction or entry point of the program. The process of copying the program to memory and running it is called loading.

each UNIX program has a run-time memory image. For example: In a 32-bit Linux system, the code snippet always starts at the address (0x8048000). The data segment is located at the next 4KB aligned address. The runtime heap grows on the next first 4KB aligned address after the read/write segment, and child labor calls the malloc library upwards. There is also a segment that is reserved for shared libraries. The user stack always starts with the largest legitimate user address and grows downward (to the low memory place). The segment that starts at the top of the stack is reserved for code and data for the part of the operating system that resides in the memory (that is, the kernel).

Guided by the Header table in the middle of the executable file, the loader copies the relevant contents of the executable file to the code and data segments. Next, the loader jumps to the entry point of the program, which is the address of the symbol _start. The startup code at the _start address is defined in the target file ctrl.o and is the same for all C programs. After the initialization routines are called from the. Text and. init sections, the startup code calls the Atexti routine, which attaches a series of programs that should be called when the application is aborted properly. The Exit function runs the atexit registered function and then returns control to the operating system by calling _exit. Next, the startup code calls the application's main program and it starts executing our C code. After the application returns, the startup code calls the _EXIT program, which returns control to the operating system.

Workflow loaded:

Each program in a UNIX system runs in a process context and has its own virtual address space. When a shell runs a program, the parent shell process generates a child process, which is a replica of the parent process. The child process invokes the bootloader through the EXECVE system. The loader removes the existing virtual memory segments of the child process and creates a new set of code, data, heap and stack segments, new stacks, and heap segments that are initialized to zero. By mapping the pages in the virtual address space to the page size of the executable, the new code and data segments are initialized to the contents of the executable file. Finally, the loader jumps to the _start address, which eventually invokes the application's main function. In addition to some header information, there is no copy of the data from disk to memory during the loading process. The operating system uses its page scheduling mechanism to automatically transfer pages from disk to storage until the CPU is applied to a mapped virtual page for copying.

7.10 Dynamic Link Shared library

Shared library is a modern innovation product that devotes to solving the defect of static library. A shared library is a target module that, at run time, can be loaded into any memory address and added to a program in memory to link it up. This process , called dynamic linking, is performed by a program called a dynamic linker. Shared libraries are also known as shared destinations, which are typically represented by the. So suffix in Unix systems.

7.11 loading and linking shared libraries from the application

Examples of dynamic links in the real world:

Distributing software
Build a high-performance Web server

7.12 Location-Independent code (PIC)

Pic Data Reference

Pic Function call

7.13 tools for working with target files

AR: Create a static library, insert, delete, list, and extract members.
STRINGS: Lists all printable strings in a destination file.
STRIP: Removes symbol table information from the destination file.
NM: Lists symbols for the symbol table definition in a target file.
Size: Lists the name and size of the section in the destination file.
Readelf: Ability to display all information for a target file.
OBJDUMP: Disassembly
LDD: Lists the shared libraries that are required for an executable file to run.

每个可重定位目标模块m都有一个符号表，包含m所定义和引用的符号的信息。 在链接器的上下文中，三种不同的符号：

1.由m定义并能被其他模块引用的全局符号。全局链接器对应于非静态的C函数以及被定义为Cstatic 属性的全局变量。

2.由其他模块定义并被模块m以引用的全局符号——外部符号，对应于定义在其他模块中的C函数和变量

3.只被模块m定义和引用的本地符号。

"In-depth understanding of computer Systems" chapter seventh links

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More