"In-depth understanding of computer Systems" Reading notes chapter seventh links

Source: Internet
Author: User

Seventh Chapter links

链接(linking)是将各种代码和数据部分收集起来并组合成为一个单一文件的过程,这个文件可被加载(或被拷贝)到存储并执行。
    • The timing of the link

      • At compile time, that is, when the source code is translated into machine code
      • When loaded, that is, when the program is loaded into storage and executed by the loader.
      • Run time, executed by the application.
    • In modern systems, links are executed automatically by the linker.
    • The key role of the linker: to make the separation of the compilation known as possible.

7.1 Compiler driver

Driver work: 1, run the C preprocessor, the C source program (. c) is translated into a Ascⅱ code intermediate file (. i); 2, run the C compiler, translate. i files into assembly files (. s), 3, run assembler, translate to relocatable target files (. o); 4, run the linker program, Combine the pieces to create an executable program.

7.2 Static links

Two tasks of linker: 1, Symbol parsing: target file definition and reference symbol. Each symbol reference is exactly one; 2. Reposition: The linker links each symbol definition to a memory location and then modifies all references to those symbols.

Basic fact: The destination file is purely a collection of byte blocks,

7.3 target file

Three different forms:

L relocatable Target files: Contains binary code and data that can be combined with other relocatable target files to create executable target files.

L Executable target file: Contains binary code and data, its form can be copied directly to memory parallel.

L Share destination file: A special type of relocatable target file that can be dynamically loaded into storage and linked at load or run time.

The compiler and assembler generate a relocatable target file, and the linker generates an executable program.

7.4 to relocate a target file

Elf relocatable target file format

. Text: The machine code of the compiled program.

. Rodata: Read-only data, such as a jump table for a format string switch statement in a printf statement

. Data: Initialized global C variable

. BSS: Uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder.

. Symtab: A symbol of grief, which stores information about functions and global variables that define and reference a typical elf relocatable target file in a program.

. Rel. Text: A list of locations in the. Text section, which you need to modify when the linker combines this target file with other files.

. Rel.data: Relocation information for any global variables referenced or defined by the module.

. Debug: A debug symbol table whose entries are local variables and type definitions defined in the program, global variables defined and referenced in the program, and the original C source file.

. Line: The mapping between the row numbers in the original C source program and the machine directives in the. Text section.

. strtab: A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header.

7.5 Symbols and Symbols table

Each relocatable target module m has a symbol table that contains information about the symbols defined and referenced by M. In the context of the linker, there are three different symbols:

• Global symbols defined by M and can be referenced by other modules. The global linker symbol corresponds to a non-static C and a global variable that is defined as not with the C static property.

• Global symbols defined by other modules and referenced by the module M. These symbols are called external symbols and correspond to C functions and variables that are defined in other modules.

• Local symbols that are only defined and referenced by module M. Some local linker symbols correspond to C functions and global variables with static properties. These symbols are ubiquitous in module M, but cannot be referenced by other modules. Local symbols can also be obtained in the destination file for the section corresponding to module M and the name of the corresponding source file.

Name is the byte offset in the string table, which points to the null-terminated string name of the symbol. Value is the address of the symbol. For relocatable modules, value is offset from the beginning of the section where the target is defined.

Each symbol is associated with a section of the destination file, which is also an index to the section Header table. There are three special pseudo-sections (pseudo section) that have no purpose in the section Header table: the symbol that ABS represents should not be relocated; UNDEF represents an undefined symbol, which is a symbol referenced in this target module but defined elsewhere; COMMON represents an uninitialized data destination that has not been assigned a location.

7.6 Symbolic parsing

The linker parses symbol references by linking each reference to a defined symbolic definition in the symbol table of the relocatable target file it enters. The compiler only allows each local symbol in each module to have only one definition. The compiler also ensures that static local variables, as well as local linker symbols, have unique names.

Based on the definition of strong and weak symbols, the UNIX linker uses the following rules to handle symbols for multiple definitions:

• Rule 1: Multiple strong symbols are not allowed.

• Rule 2: If you have a strong symbol and multiple weak symbols, select the strong symbol.

• Rule 3: If there are multiple weak symbols, select one of these weak symbols.

All compilation systems provide a mechanism to package all relevant target modules into a single file, called Static Library, which can be used as input to the linker. By placing all the standard C-draw numbers in a single relocatable target module (such as LIBC), the application programmer can link the module to their executable file:

unix> GCC MAIN.C/USR/LIB/LIBC.O

A big drawback is that every executable file in the system now contains a full copy of the standard set of functions, which is a huge waste of disk space. We can solve some of these problems by creating a separate relocatable file for each of the standard functions and storing them in a directory known to everyone.

unix> gcc main.c/usr/lib/printf.o/usr/lib/scanf.o ...

In UNIX systems, static libraries reside on disk in a special file format called archive (archive). An archive file is a set of connected, relocatable target files that have a header that describes the size and location of each member's target file. The archive file name is identified by the suffix. A. In order to create this executable, we want to compile and link the input file main.o

Unix> gcc-02-c main2. C

Unix> gcc-static-0 p2 main2. O/libvector.a

7.7 Relocation

Relocation is made up of two steps:

·重定位节和符号定义。在这一步中,链接器将所有相同类型的节合并为同一类型的新的聚合节。·重定位节中的符号引用。在这一步中,链接器修改代码节和数据节中对每个符号的引用,使得它们指向正确的运行时地址。

When the assembler generates a target module, it does not know where the data and code will ultimately reside in memory. It also does not know the location of any externally defined functions or global variables referenced by this module. The relocation entry for the code is placed in the. Rel.text. Re-positioning of initialized data
The entry is placed in the. Rel.data.

Two of the most basic relocation types:

R_386_PC32: 重定位一个使用32 位PC 相对地址的引用.一个PC 相对地址就是距程序计数器(PC) 的当前运行时值的偏移量。当CPU 执行一条使用PC 相对寻址的指令时,它就将在指令中编码的32 位值上加上PC 的当前运行时值,得到有效地址。R_386_32: 重定位一个使用32 位绝对地址的引用。通过绝对寻址, CPU 直接使用在指令中编码的32 位值作为有效地址,不需要进一步修改。
7.8 executable target file

The format of the execution target file is similar to the format of the relocatable destination file. The overall format of the ELF header description file. It also includes the entry point of the program (Centry points), which is the address of the first instruction to be executed when the program is run.

The. Init section defines a small function called Init, which is called by the program's initialization code. Because the executable file is fully linked (it has been relocated), it no longer requires. rel.

7.9 Loading executable target files

To run the executable target file p, you can enter its name on the command line of the Unix shell:

unix> . /p

Because P is not a built-in shell command, the shell will assume that P is an executable target file, called by a call to the loader (loader) that resides in memory. Operating system code to run it. The loader copies the code and data from the disk to the storage in the executable destination file, and then runs the program by jumping to the first instruction or entry point of the program. The process of copying the program to memory and running it is called loading.

In a 32-bit Linu system, the code snippet always starts at the address Ox08048000. The data segment is located at the next 4KB aligned address. The run-time heap grows on the next first 4KB aligned address after the read/write segment, and increases by calling the malloc library.

7.10 Dynamic Link Working library

Shared libraries are a modern and innovative product dedicated to solving the defects of static libraries. A shared library is a target module
block, at run time, can be loaded into any memory address and linked to a program in memory. This process, called dynamic linking, is performed by a program called the Dynamic Linker (linker).

Shared libraries, also known as shared targets, are commonly used in UNIX systems. S suffix to indicate. Microsoft's operating system uses a large number of shared libraries, which are called DLLs (dynamic-link libraries).

First, in any given file system, there is only one. 5 for a library. File. All executable target files referencing the library share this. 5. The code and data in the file, rather than being copied and embedded as the contents of the static library into the executable file that references them. Second, in storage, a copy of the. Text section of a shared library can be shared by different running processes.

Give the linker the following special instructions:

unix> gcc -sbared -ÎPIC -0 libvector.so addvec.c multvec.c//-fPIC 选项指示编译器生戚与位置无关的代码 -5hared 选项指示链接器创建一个享的目标文件。unix> gcc -0‘p2 main2 .c ./libvector.so//这样就创建了一个可执行目标文件p2 ,而此文件的形式使得它在运行时

The dynamic linker then completes the link task by performing the following relocation:

.重定位libc.5 。的文本和数据到某个存储器段。·重定位lib飞Tector.5 。的文本和数据到另一个存储器段。·重定位p2 中所有对由libc.5。和libvector.5。定义的符号的引用。
7.11 loading and linking shared libraries from the application
#include <dlfcn.h>void *dlopen(const char *filename , int flag);返回若成功则为指向句柄的指针,若出错则为NULL 。

The Dlopen function loads and links the shared library filename. Resolves an external symbol in filename with a library that was previously opened with the RTLD GLOBAL option. If the current executable is compiled with the rdynamic option, its global symbol is also available for symbolic resolution.

#include <dlfcn.h>void *dlsym(void *handle , char *symbol);返回若成功为指向符号的指针,若出错则为NULL 。

The input to the DLSYM function is a pointer to the handle of the shared library that was previously opened and a symbolic name, and if the symbol exists, returns the address of the symbol, otherwise null is returned.

#include <dlfcn.h>int dlclose (void *handle);返回:若成功为0 ,若出错则为1.

If no other shared library is using this shared library, the Dlclose function uninstalls the shared library.

#include <dlfcn.h>const char *dlerror(void);返回如采前面对dlopen 、dlsym 或dlclose 的调用失败,则为错误消息,如果前面的调用成功,则为NULL.

The Dlerror function returns a string that describes the most recent error that occurred while invoking the Dlopen, dlsym, or Dlclose function, and returns NULL if no error occurred.

7.12 Location-Independent code

How do multiple processes share a copy of a program? One way is to assign each shared library a pre-prepared dedicated address space slice (chunk), and then ask the loader to always load the shared library at that address. A better approach is to compile the library code so that the code can be loaded and executed at any address without the need for the linker to modify the library code. This code is called location-independent code.

Regardless of where we load a target module in memory (including a shared target module), the data segment is always allocated immediately after the code snippet. To use this fact, the compiler creates a table at the beginning of the data segment, called the global offset.

"In-depth understanding of computer Systems" Reading notes chapter seventh links

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.