Static linking of programs, dynamic linking and loading

Source: Internet
Author: User

Label:

The whole process of program compiling link

Usually we use GCC to generate executable programs, the command is: GCC hello.c, the default generation of executables A.out

In fact, the command to compile (including links): gcc hello.c can be decomposed into the following 4 large steps:

      • pretreatment (preprocessing)
      • compiling (compilation)
      • compilation (Assembly)
      • links (linking)

GCC compilation

1. pretreatment (preproceessing)

The process of preprocessing mainly involves the following processes:

    • Delete all # define, and expand all macro Definitions
    • handles all conditional precompiled directives , such as # if #ifdef #elif #else #endif等
    • processes the # include precompiled directive, inserting the contained file into the location of the precompiled instruction.
    • Delete all comments "//" and "/* * */".
    • Add line numbers and file identities so that compile-time generates debug line numbers and compile error warning line numbers.
    • Keep all #pragma compiler directives because the compiler needs to use them

The following commands are typically used for preprocessing:

GCC-E Hello.c-o hello.i

The parameter- e means that preprocessing is done only or can be completed using the following directives

CPP hello.c > HELLO.I/* cpp–the C Preprocessor */

Direct Cat hello.i You can see the preprocessed code

2. Compiling (compilation)

The process of compiling is a series of lexical analysis, grammatical analysis, semantic analysis and optimization of the pre-processed files into corresponding assembly codes.

$GCC –S Hello.i–o Hello.s

Or

$/usr/lib/gcc/i486-linux-gnu/4.4/cc1 hello.c

Note: The current version of GCC synthesizes the two steps of preprocessing and compiling in a single step, which is done with the CC1 tool. GCC is actually some of the background program packaging, according to different parameters to invoke other actual processing programs, such as: Precompiled compiler cc1, assembler as, connector ld

3. Compilation (Assembly)

A assembler is a command that transforms assembly code into a machine executable that almost every assembly statement corresponds to a machine instruction. The assembly is relatively simple compared to the compilation process, according to the assembly instructions and machine instructions for the comparison table one by one translation can be.

$ gcc–c Hello.c–o hello.o

Or

$ as Hello.s–o hello.co

Because the content of HELLO.O is machine code, cannot view in plain text form (vi opens to see is garbled).

4. links (linking)

The linker ld is called to link a large stack of target files needed for the program to run, as well as other library files that are dependent on it, and finally the executable file is generated.

Ld-static crt1.o crti.o crtbegint.o hello.o-start-group-lgcc-lgcc_eh-lc-end-group crtend.o crtn.o (the path name of the file is omitted).

This is how the HelloWorld compiles and links process, so what exactly does the compiler and linker do?

The compilation process can be divided into 6 steps: Scanning (lexical analysis), parsing, semantic analysis, source code optimization, code generation, target code optimization.

Lexical analysis: the scanner (Scanner) divides the character sequence of the source generation into a series of tokens. The Lex tool enables lexical scanning.

Parsing: The parser generates tokens (tokens) into a syntax tree (Syntax tree). The YACC tool enables parsing (Yacc:yet another Compiler Compiler).

Semantic analysis: Static semantics (semantics that can be determined by the compiler), dynamic semantics (semantics that can only be determined at run time).

Source code optimization: source code Optimizer, which translates the entire grammar book into intermediate Code (intermediate Code) (the intermediate codes are independent of the target machine and the running environment). The intermediate code allows the compiler to be divided into front and back ends. The compiler's front-end is responsible for generating machine-independent intermediate code, and the compiler backend translates the intermediate code into the target machine code.

Target code generation: code Generator.

Target code optimization: Target Code Optimizer.

The main content of the link is to deal with the parts of each module that are referenced to each other, so that the modules can be properly connected to each other.

The main processes of linking are: address and space allocation (addresses and Storage Allocation), symbol resolution (symbol Resolution), relocation (relocation), etc.

Links are divided into static links and dynamic links.

static linking is the process of adding a static library to an executable file directly during the compile phase, so that the executable file is relatively large.

dynamic Link refers to the link stage only to add some descriptive information, and the program executes the corresponding dynamic library from the system to load into memory.

The approximate procedure for static linking is as follows:

static linking


Ii. what the target file looks like (in the case of the elf file format under Linux)

Sandwiched between the elf head and the section Head table are all sections. A typical elf relocatable target file contains the following sections:

  • . Text: The machine Code of the compiled program.
  • . Rodata: Read-only data, such as a jump table for format strings and switch statements in printf statements.
  • . Data: A global c variable that has been initialized. Local c variables are stored in the stack at run time, and are not present in the. BSS section.
  • . BSS: Uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder. The destination file format distinguishes between initialization and uninitialized variables for space efficiency: in the target file, uninitialized variables do not need to occupy any actual disk space.
  • . Symtab: a symbol table that holds functions and global variables that are defined and referenced in the program (including the external variables and functions that are referenced, and that do not contain local variables) the information. Some programmers mistakenly assume that a program must be compiled with the-G option to get symbolic table information. In fact, each relocatable target file has a symbol table in. Symtab. However, unlike the symbol table in the compiler, the. symtab symbol table does not contain a table of local variables.
  • . Rel.text: Many of the locations in the. Text section need to be modified when the link compiler combines this target file with other files. In general, any call to an external function or reference to a global variable (including a global variable within the target file, because the same segment of the target file is merged at the time of the link so that the address of the data changes, so the instructions for repositioning) need to be modified . On the other hand, directives that invoke local functions do not need to be modified. Note that it is not necessary to relocate the information in the executable target file, so it is usually omitted unless the consumer explicitly instructs the linker to include the information.
  • . Rel.data: Information about any global variables that are defined or referenced by the module. In general, the initial value of any initialized global variable is a global variable or the address of an externally defined function needs to be modified.
  • . Debug: A debug symbol table, some of which are local variables and type definitions defined in the program, and some are global variables defined and referenced in the program, some of which are original C source files. This table is only available if you are invoking the compile driver with the-G option.
  • .Line: The mapping between the row numbers in the original C source program and the machine directives in the. Text section. This table is only available if you are invoking the compile driver with the-G option.
  • . Strtab: A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header. The string table is a null-terminated sequence of strings.

Side note: Why uninitialized data is called. BSS?

It is common to use the term. BSS to represent uninitialized data. It starts with the acronym "block Storage Start" instruction in IBM 704 assembly language (approximately 1957) and continues to this day. A simple way to remember to differentiate between. Data and. BSS is to think of "BSS" as "Better Space saving" (Better save spaces)! "The abbreviation.

Third, static link

Virtual memory is a kind of storage system, which is based on main memory-auxiliary physical structure, which is composed of additional hardware device and operating system storage management software.

As the name implies, virtual memory is a virtual memory, it does not exist, but only by some hardware and software management of a "system." He offers three important abilities: 1, which takes main memory as a cache of address space stored on disk, stores only active areas in main memory, and transmits data back and forth between disk and main memory as needed (there are concepts such as "swap Space" and "page scheduling"), which effectively uses main memory in this way; 2 , which simplifies memory management by providing a unified address space for each process (addressed as a virtual address), 3, the operating system provides a separate address space for each process, thus protecting the address space of each process from being destroyed by other processes.

Virtual memory and virtual address space are two different concepts: virtual memory is imaginary memory, and virtual storage space is imaginary memory. The relationship between them should be similar to the relationship between the primary memory and the memory space.

Link section:

A link is the process of collecting and combining code and data from different parts into a single file, that is, the process of merging different target files into a final executable file. Of course, it's important to know that this process does not involve memory. Links can be divided into three scenarios: 1, compile-time links, which we often call static links, 2, load-time links, 3, run-time links. Load-time linking and run-time linking are called dynamic links. Here, our links section will mainly talk about static links, while load-time links We put in the loading section, the run-time link is ignored.

1. What is static link?

Static linking is the combination of multiple target files together to form an executable file, such as linking A.O and b.o together to form an executable AB.

2. What are the parts of the static link process?

Static links consist of two parts: one is the allocation of space and address, and the other is symbolic parsing and relocation

(1) Allocation of space and address

How does the compiler combine A.O and B.O??

The second approach is to merge similar segments: As the name implies, merge segments of the same name from different target files into a single segment, such as:

Figure 1

This is the policy in which the compiler actually merges the target files.

/*a.c*/
externint shared; int Main () {int; Swap (&a,&shared);}
/*b.c*/
int1; void swap (int *a,int *b) { *a ^= *b ^= *a ^= *b;}

Compile these two files to get "A.O" and "B.O" two target files

§gcc-c A.C B.C

You can see three symbols from your code: Share,swap and Main.

The entire process of static linking is divided into two steps:

The first step: space and address assignment. Scan all the input target files, get their lengths, attributes, and locations, and collect all the symbol definitions and symbol references from the symbol table in the input target file, and put them uniformly into a global symbol table. In this way, the connector will be able to get the length of all the input target files, merge them, calculate the lengths and positions of each segment in the output file, and establish a mapping relationship.

There may be a problem here: what kind of mapping relationship is established. As shown in Figure 1 above, you will probably know something about it. Mapping refers to the mapping between an executable file and a process virtual address space. So, here the program has not been executed, and there will be no process, where the process address space? At this point the virtual memory plays a big role: although there are no processes at this time, the format of the virtual address space for each process is consistent. Therefore, there is nothing wrong with assigning an address to each segment of an executable file or even to each symbol symbol. Note: Before linking, the virtual address of all segments in the destination file is 0 because the virtual space is not yet assigned, and the default is 0. After the link, each segment in the executable has been assigned to the corresponding virtual address

Step two: Symbolic parsing and relocation

First, symbolic parsing. Parsing a symbol is the linkage of each symbol reference to a defined symbolic definition in the symbol table in the relocatable target file it enters.

If it is not found, a compile-time error occurs.

The second is relocation;

Different processor directives differ in the format and manner of the addresses. We use a 32-bit x86 processor, which describes two ways of addressing.

X86 Basic Relocation Type

Macro definition

Value

Reposition Correction Method

R_386_32

1

Absolute addressing correction S + A

R_386_pc32

2

Relative addressing correction s + a-p

Note:

A: The value stored in the corrected position, for 32-bit CPU, use
R_386_PC32 addressing the words
It should be 0xFFFFFFFC-4, which is four bytes representing the address;
R_386_32 addressing, it should be 0.

P: the corrected position. Consider the following procedure

...

1023:11 11 11

1026:e8
Fc
FF FF FF

102b:11 11 11

...

The Blue FC Mark above is the corrected position, i.e. the 0x1027.

S: The actual address of the symbol. This is the symbolic virtual address that is obtained when space and address assignment are in the first step.

For example! In the executable file that is linked to, assume that the virtual address of the main function is the virtual address of the 0x1000,swap function 0x2000;shared variable is 0x3000;

Absolute address correction: Address correction for shared variables.

L
The actual address of the s:shared is 0x3000;

L
A: The value of the corrected position, that is, 0.

So the last relocation fix address is: 0x3000, unchanged!

Relative addressing correction: the symbol "swap" is corrected.

L
S: The actual address of the symbol swap, i.e. 0x2000;

L
A: The value of the corrected position, i.e. 0xFFFFFFFC (-4);

L
P: Corrected position, and 0x1027

The last relocation fix address is: S + a-p = 0x2000 + ( -4)-0x1027 = 0xfd5. The revised program is:

...

1023:11 11 11

1026:e8
d5 0f 00 00

102b:11 11 11

...

Have you found the familiar rules? The address of the next instruction (PC) is 0x102b, plus this fix value is exactly equal to 0x2000,

0x102b + 0xfd5 = 0x2000, exactly the address of the swap function.

The above content is not related to the C standard library, is only the implementation of the two C language program between the link between the state, that is, "how to deal with the program printf" no description. Here, we will refer to the concept of "static library". In fact, a static library can simply be regarded as a set of target files, that is, many of the target files are compressed after the formation of a file. The process of linking to a static library is like this: the LD linker automatically finds the global symbol table, finds the symbols for the resolution, and then locates the target files where they are located, "unzip" them from the static library, and eventually link them together as an executable file. This means that only a handful of libraries and target files are chained to the final executable file, and not all libraries are linked to the executable file in a single brain.

Static linking of programs, dynamic linking and loading

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: