In the seventh chapter of the link, the link can be compiled at compile time by his compiler, can also be loaded and run by the dynamic linker to complete. The linker handles binary files that can be the target file, and it has three different forms: relocatable and executable and shared.
The two main tasks of the linker are symbolic parsing and relocation, and symbolic parsing binds each global symbol in the file to a unique definition, repositioning the final memory address of each symbol, and modifying references to those targets.
A static link is called by a compiled drive such as GCC.
Multiple target files can be mapped to memory and run the program.
The loader maps the contents of the executable file to the storage and runs the program.
Shared libraries that are compiled as location-independent code can be loaded anywhere, or can be shared by multiple processes at run time. To load and link and access functions and data for shared libraries, applications can also use dynamic linker at run time.
1. Compiler driver
1.大部分编译系统提供编译驱动程序:代表用户在需要时调用语言预处理器、编译器、汇编器和链接器。
(1)C预处理器:源程序main.c->ASCII码中间文件main.i
(2)C编译器:main.i->ASCII码汇编语言文件main.s
(3)C汇编器:main.s->可重定位目标文件
2.运行链接器程序ld,将各种.o文件以及必要的系统目标文件组合起来,创建可执行文件。
3.运行可执行文件:./可执行文件名字
4.外壳调用操作系统中加载器函数,拷贝可执行文件中的代码和数据到存储器,将控制转移到这个程序的开头
2. Static link
1.以一组可重定位目标文件和命令行参数作为输入,生成一个完全链接的可以加载和运行的可执行目标文件作为输出。
2.输入的可重定位的目标文件由各种不同的代码和数据节组成。
3.指令在一个节中,初始化的全局变量在另一个节中,而未初始化的变量又在另外一个节中。
In order to construct an executable file, the linker must complete two main tasks:
-Symbolic parsing
The target file definition and reference symbol. The purpose of symbolic parsing is to associate each symbol reference with exactly one symbol definition.
-Reposition
The compiler and assembler generate hungry code and data sections starting at address 0. The linker repositions These sections by linking each symbol definition to a memory location and then modifying all references to those symbols so that they point to the memory location.
3. target file
-Compilers and assemblers generate relocatable target files (including shared destination files). The linker generates executable target files. Technically, a target module is a byte sequence, and a target file is a target module that is stored in a disk file.
-Compilers and assemblers generate definable target files (including shared destination files). The linker generates executable target files.
目标文件三种形式:可重定位目标文件,可执行目标文件,共享目标文件。
目标文件格式:Linux——可执行可连接(ELF格式)
Windows——可移植可执行格式(PE)
4. Can relocate the target file
. The Elf header (elf header) begins with a 16-byte sequence that describes the size and byte order of the word of the system that generated the file. The remainder of the ELF header contains information that helps the linker parse and interpret the target file. These include the size of the elf head, the type of the target file (such as relocatable, executable, or shared), machine type (such as IA32), file offset of the section Header table, and the size and number of entries in the section Header table. The location and size of the different sections are described by the section Header table, where each section in the destination file has a fixed-size entry.
Both the Elf Head and the section Head table are borrowed. A typical elf relocatable target file contains the following sections:
- . text The machine code of the compiled program
- . Rodata read-only data
- . Data has been initialized for global C variables. Local c variables are stored in the stack at run time, and are not present in the. Data section, nor in the. BSS section.
- . Bass uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder. The destination file format distinguishes between initialization and uninitialized variables for space efficiency: in the target file, uninitialized variables do not need to occupy any actual disk space.
- . symtab a symbol table that holds information about functions and global variables that are defined and referenced in the program. Each relocatable destination file has a symbol table in. Symtab.
- . rel.text A. The list of locations in the text section, which needs to be modified when the linker is combining this target file with other files. In general, any instruction that calls an external function or references a global variable needs to be modified. On the other hand, directives that invoke local functions do not need to be modified. Note that it is not necessary to relocate the information in the executable target file, so it is usually omitted unless the user displays the section indicating that the linker contains this information.
- . Rel.data the relocation information for any global variables referenced by or defined by the module. Generally, any initialized global variable needs to be modified if its initial value is a global variable address or an address of an externally defined function.
- . debug a Debug symbol table whose entries are local variables and type definitions that are always defined by the program, global variables defined and referenced in the program, and the original C source file.
- The line number in the original C source file and the mapping between the machine directives in the. Text section.
- . strtab A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header.
5. Symbols and Symbols table
每个可重定位目标模块m都有一个符号表,包含m所定义和引用的符号的信息。
在链接器的上下文中,三种不同的符号:
1.有m定义并能被其他模块引用的全局符号。全局链接器对应于非静态的C函数以及被定义为Cstatic 属性的全局变量。
2.有其他模块定义并被模块m以引用的全局符号——外部符号,对应于定义在其他模块中的C函数和变量
3.只被模块m定义和引用的本地符号。
6. Symbol parsing
Based on the definition of strong and weak symbols, the UNIX linker uses the following rules to handle symbols for multiple definitions:
- Rule 1: Multiple strong symbols are not allowed.
- Rule 2: If you have a strong symbol and multiple weak symbols, select the strong symbol.
- Rule 3: If there are multiple weak symbols, select one of these weak symbols.
During the symbolic parsing phase, the linker scans the relocatable destination and archive files in the same order that they appear on the compiler driver command line, from left to right. In this scan, the linker maintains a set of relocatable target files E (the files in this collection are merged to form an executable file), an unresolved symbol (that is, a symbol that references but not yet defined), and a set of symbols defined in the previous input file, D. Initially, E, U, and D are empty.
7. Reposition
Relocation is made up of two steps:
1. Reposition section and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregation section of the same type. The linker then assigns the run-time memory address to the new aggregation section, assigns each section defined by the input module, and assigns each symbol to the input module definition. When this step is complete, each instruction and global variable in the program has a unique runtime memory address.
2. Reposition the symbol reference in the section. In this step, the linker modifies the reference to each symbol in the Code section and data section so that they point to the correct run-time address. To perform this step, the linker relies on the data structure in the Relocatable target module called the relocation entry.
8. Executable Target file
The format of the executable destination file is similar to the format of the relocatable destination file. The overall format of the ELF header description file. It also includes the entry point of the program, which is the address of the first instruction to be executed when the program is run: the text,. Rodata, and. Data sections and the sections in the Relocatable destination file are similar except that these sections have been relocated beyond their final run-time memory addresses: The init section defines a small function called _init, which is called by the program's initialization code.
9. Loading executable target files
Each UNIX program has a run-time memory image. For example: In a 32-bit Linux system, the code snippet always starts at the address (0x8048000). The data segment is located at the next 4KB aligned address. The runtime heap grows on the next first 4KB aligned address after the read/write segment, and child labor calls the malloc library upwards. There is also a segment that is reserved for shared libraries. The user stack always starts with the largest legitimate user address and grows downward (to the low memory place). The segment that starts at the top of the stack is reserved for code and data for the part of the operating system that resides in the memory (that is, the kernel).
10. Dynamic Link Shared library
Shared library is a modern innovation product that devotes to solving the defect of static library. A shared library is a target module that, at run time, can be loaded into any memory address and added to a program in memory to link it up. This process, called dynamic linking, is performed by a program called a dynamic linker. Shared libraries are also known as shared destinations, which are typically represented by the. So suffix in Unix systems.
11. Load and link shared libraries from the application
Examples of dynamic links in the real world:
-Distribute Software
-Build a high-performance Web server
12. Location-independent code (PIC)
Pic Data Reference
Pic Function call
In-depth understanding of the computer seventh chapter