"In-depth understanding of computer Systems" chapter seventh study notes (first draft)

Source: Internet
Author: User

Seventh Chapter Links

A link is the process of collecting and combining various pieces of code and data into a single file that can be loaded (or copied) into memory and executed. A link can be executed at compile time, that is, when the source code is translated into machine code, or it can be executed at load time , that is, when the loader is loaded into storage and executed, or even executed at run time by the application. In an earlier computer system, the links were executed manually. In a modern system, a link is automatically executed by the called Linker.

    • Understanding the linker will help construct large programs
    • Understanding the linker will help avoid some dangerous programming errors
    • Understand how the linker will help with how the language's scope rules are implemented
    • Understanding the linker will help other important system concepts
    • Understanding the linker facilitates the use of shared libraries
One, compiler driver

Most compiled systems provide a compiler driver, which represents the language preprocessor, compiler, assembler, and linker that the user invokes when needed.

Gcc-o2-g-o p main.c swap.c

The code above translates the sample program from an ASCII source file into an executable target file, as follows:

Second, static link

Static linker such as the UNIX LD program takes a set of relocatable target files and command-line arguments as input, generating a fully-linked executable target file that can be loaded and run as output. The relocatable destination file that you enter is made up of a variety of different code and data sections. Directives in one section, initialized global variables in another section, and uninitialized variables in another section.

1. in order to construct an executable file, the linker must complete two main tasks:

    • Symbol parsing: Target file definition and reference symbol. The purpose of symbolic parsing is to associate each symbol reference with exactly one symbol definition.
    • Relocation: The compiler and assembler generate hungry code and data sections starting at address 0. The linker repositions the nodes by linking each symbol definition to a memory location and then modifying all references to those symbols so that they point to the memory location.

2. Some basic facts of the linker: The target file is purely a collection of byte blocks. Some of these blocks contain program code, some contain program data, and others contain data structures that guide the linker and the loader. The linker connects the blocks, determines the run-time position of the connected block, and modifies various locations in the code and data blocks. The linker and assembler have done most of the work.

Iii. Target Documents

1. Three forms of the target file:

    • The destination file can be relocated. Contains binary code and data in a form that can be combined with other relocatable target files at compile time to create an executable target file
    • Executable target file: Contains binary code and data, the form can be copied directly to the memory and executed
    • Shared destination file: A special type of relocatable target file that can be dynamically loaded into storage and linked at load or run time

2. the compiler and assembler generate a relocatable target file (including a shared destination file). The linker generates executable target files. Technically, a target module is a byte sequence, and a target file is a target module that is stored in a disk file.

Four, can relocate the target file

1. A typical elf can reposition the destination file in the format P451. The Elf header (elf header) begins with a 16-byte sequence that describes the size and byte order of the word of the system that generated the file. The remainder of the ELF header contains information that helps the linker parse and interpret the target file. These include the size of the elf head, the type of the target file (such as relocatable, executable, or shared), machine type (such as IA32), file offset of the section Header table, and the size and number of entries in the section Header table. The location and size of the different sections are described by the section Header table, where each section in the destination file has a fixed-size entry.

2. between the elf Head and the section Head table is a section. A typical elf relocatable target file contains the following sections:

    • . text The machine code of the compiled program

    • . Rodata read-only data

    • . Data has been initialized for global C variables. Local c variables are stored in the stack at run time, and are not present in the. Data section, also in the. BSS section

    • . Bass uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder. The destination file format distinguishes between initialization and uninitialized variables for space efficiency: in the target file, uninitialized variables do not need to occupy any actual disk space

    • . symtab a symbol table that holds information about functions and global variables that are defined and referenced in the program. Each relocatable destination file has a symbol table in. symtab

    • . rel.text A. The list of locations in the text section, which needs to be modified when the linker is combining this target file with other files. In general, any instruction that calls an external function or references a global variable needs to be modified. On the other hand, directives that invoke local functions do not need to be modified. Note that it is not necessary to relocate the information in the executable target file, so it is usually omitted unless the user displays the section indicating that the linker contains this information

    • . Rel.data the relocation information for any global variables referenced by or defined by the module. Generally, any initialized global variable needs to be modified if its initial value is a global variable address or an address of an externally defined function.

    • . debug a Debug symbol table whose entries are local variables and type definitions that are always defined by the program, global variables defined and referenced in the program, and the original C source file

    • The line number in the original C source file and the mapping between the machine directives in the. Text section

    • . strtab A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header.

V. Symbols and Symbols table

1. each relocatable target module m has a symbol table that contains information about the symbols defined and referenced by M.

2. three different symbols in the context of the linker:

    • A global symbol that is defined by M and can be referenced by other modules. The global linker symbol corresponds to a non-static C function and a global variable that is defined as not with the C static property
    • A global symbol that is defined by another module and referenced by the module M. These symbols are called external symbols and correspond to C functions and variables defined in other functions
    • Local symbols that are defined and referenced only by module M. Some local linker symbols correspond to C functions and global variables with static properties. These symbols are ubiquitous in module M, but cannot be referenced by other modules. Local symbols are also available in the destination file for the section corresponding to module M and the name of the corresponding source file

3. Local process variables defined as C static properties are not managed in the stack. The compiler allocates space for each definition in. data and. BSS, and creates a unique local linker symbol with a name in the symbol table.

4. Use the static property to hide variables and function names. Any global variable or function that declares a static property is private to the module.

    • Name is the byte offset in the string table, which points to the null-terminated string name of the symbol
    • Value is the address of the symbol. For relocatable modules, value is offset from the start of the section where the target is defined. For executable target files, this value is an absolute run-time address
    • The size of the target, in bytes
    • Type is a data or function
    • The binding field indicates whether the symbol is local or global
    • The section field represents an index to the Header table
Vi. Symbolic Resolution 6.1 How the linker resolves global symbols for multiple definitions

1. at compile time, the compiler outputs each global symbol to the assembler, either strong or weak, and the assembler implicitly encodes the information in the symbol table of the relocatable target file. The function and the initialized global variable are strongly signed, and the uninitialized global variable is a weak symbol.

2. The UNIX linker uses the following rules to handle multiple-definition symbols, based on the definition of a strong and weak symbol:

    • Rule 1: Multiple strong symbols are not allowed
    • Rule 2: If you have a strong symbol and multiple weak symbols, select strong symbol
    • Rule 3: If there are multiple weak symbols, select any one of these weak symbols
6.2 Linking to static libraries

1. programs that use standard C libraries and functions in the math library can be compiled and linked in the form of a command line such as the following:

GCC MAIN.C/USR/LIB/LIBM.A/USR/LIB/LIBC.A

2. in Unix systems, the static library is in a special file format called Archive village where on disk. An archive file is a set of connected, relocatable target files that have a header that describes the size and location of each member's target file. The archive file name is identified by the suffix. A.

    1. The process of linking to a static library

6.3 How the linker uses static libraries to parse references

1. During the symbolic parsing phase, the linker scans the relocatable target and archive files from left to right in the same order that they appear on the compiler driver command line. In this scan, the linker maintains a set of relocatable target files E (the files in this collection are merged to form an executable file), an unresolved symbol (that is, a symbol that references but not yet defined), and a set of symbols defined in the previous input file, D. Initially, E, U, and D are empty.

    • For each input file f on the command line, the linker determines whether F is a target file or an archive file. If f is a target file, then the linker adds F to E, modifies U and D to reflect the symbol definitions and references in F, and continues the next input file

    • If f is an archive file, the linker tries to match the unresolved symbols in U and the symbols defined by the archive file members. If an archive member, M, defines a symbol to resolve a reference in U, the M is added to E, and the linker modifies U and D to reflect the symbol definitions and references in M. This process is repeated for all the member target files in the archive file until both U and D are no longer changed. At this point, any target file that is not contained in E is simply discarded, and the linker continues to process the next input file

    • If u is non-null after the linker finishes scanning the input file on the command line, the linker is good at outputting an error and terminating it. Otherwise, it merges and repositions the target file in E to build the output executable file

2. This algorithm causes some disturbing link-time errors, because the order of the libraries and destination files on the command line is very important. On the command line, if the library that defines a symbol appears before the target file referencing the symbol, then the reference cannot be parsed and the link fails. The general guideline for libraries is to put them at the end of the command line.

3. on the other hand, if the libraries are not independent of each other, then they must be sorted so that the symbol s, which is referenced externally to each member of the archive file, has at least one of the s's definitions in the command line after the reference to S.

4. If you need to meet dependent requirements, you can repeat the library on the command line.

Seven, re-positioning

1. Once the linker has completed the symbolic parsing step, it links each symbol in the code to a symbolic definition (that is, a symbol table entry in one of its input target modules). At this point, the linker knows the exact size of the Code section and Data section in its input target module. You are now ready to reposition, and in this step you will merge the input modules and assign a run-time address to each symbol. Relocation is made up of two steps:

    • reposition sections and symbol definitions. in this step, the linker merges all sections of the same type into a new aggregation section of the same type. The linker then assigns the run-time memory address to the new aggregation section, assigns each section defined by the input module, and assigns each symbol to the input module definition. When this step is complete, each instruction and global variable in the program has a unique runtime memory address.

    • the symbol reference in the relocation section. in this step, the linker modifies the reference to each symbol in the Code section and data section so that they point to the correct run-time address. In order to perform this step, the linker relies on the data structure in the Relocatable target module called the relocation entry

7.1 Relocation Entries

1. When the assembler generates a target module, it does not know where the data and code are ultimately stored in memory. It also does not know the location of any externally defined functions or global variables referenced by this module. Therefore, whenever the assembler encounters a target reference to the final position location, it generates a relocation entry that tells the linker how to modify the reference when it merges the target file into an executable file. The relocation entry for the code is placed in the. Rel.text. The relocation entry for the initialized data is placed in the. Rel.data.

2. elf defines 11 different relocation types. Two of the most basic relocation types:

    • R_ 386_PC32 relocate a reference to a relative address using a 32-bit PC.
    • R_ 386_32 reposition A reference that uses a 32-bit absolute address.
7.2 Reposition Symbol References

The linker modifies the reference to each symbol in the Code section and data section so that they point to the correct run-time address.

    1. Reposition PC Relative Reference
    2. Reposition Absolute References
Viii. executable target file
    1. The format of the executable destination file is similar to the format of the relocatable destination file. The overall format of the ELF header description file. It also includes the entry point of the program, which is the address of the first instruction to be executed when the program is run: the text,. Rodata, and. Data sections and the sections in the Relocatable destination file are similar except that these sections have been relocated beyond their final run-time memory addresses: The init section defines a small function called _init, which is called by the program's initialization code. Because the executable file is fully linked (it has been relocated), it no longer requires the. Rel section.

    1. Elf executables are designed to be easily loaded into memory, and successive slices of an executable file are mapped to contiguous memory segments. The section Header table describes this mapping relationship.
Nine, loading executable target file

1. to run the executable target file p, you can enter its name on the command line of the Unix shell:

Unix>./P

Because P is not a built-in shell command, the shell will assume that P is an executable target file and run it by calling an operating system code that resides in memory called the loader (loader). Any UNIX program can call the loader by calling the EXECVE function. The loader copies the code and data from the disk into storage in the executable destination file, and then runs the program by jumping to the first instruction or entry point of the program. The process of copying the program to memory and running it is called loading.

2. each UNIX program has a run-time memory image. For example: In a 32-bit Linux system, the code snippet always starts at the address (0x8048000). The data segment is located at the next 4KB aligned address. The runtime heap grows on the next first 4KB aligned address after the read/write segment, and child labor calls the malloc library upwards. There is also a segment that is reserved for shared libraries. The user stack always starts with the largest legitimate user address and grows downward (to the low memory place). The segment that starts at the top of the stack is reserved for code and data for the part of the operating system that resides in the memory (that is, the kernel).

3. under the guidance of the Header table in the middle of the executable, the loader copies the relevant contents of the executable file to the code and data segments. Next, the loader jumps to the entry point of the program, which is the address of the symbol _ start. The start code at the _ Start address is defined in the target file CTRL.O and is the same for all C programs. After the initialization routines are called from the. Text and. init sections, the startup code calls the Atexti routine, which attaches a series of programs that should be called when the application is aborted properly. The Exit function runs the atexit registered function, and then returns control to the operating system by calling _ exit. Next, the startup code calls the application's main program and it starts executing our C code. After the application returns, the start code calls the exit program, which returns control to the operating system.

X. Dynamic Link Sharing Library

1. Concept: Shared library is a modern innovation product that strives to solve the defect of static library. A shared library is a target module that, at run time, can be loaded into any memory address and added to a program in memory to link it up. This process, called dynamic linking, is performed by a program called a dynamic linker. Shared libraries are also known as shared destinations, which are typically represented by the. So suffix in Unix systems.

-shared-fpic-o p2 main.c/libvector.so

2. The dynamic linker completes the link task by performing the following relocation:

    • Reposition libc.so text and data to a memory segment
    • Reposition libvector.so text and data to another memory segment
    • Relocate all references to libc.so and libvector.so defined symbols in P2
    • Finally, the dynamic linker passes control to the application, where the shared library location is fixed and does not change during program execution
Xi. loading and linking shared libraries from the application

Idea: Package each function that generates dynamic content in a shared library.

12, location-independent code

Concept: Compile library code so that the code can be loaded and executed at any address without the need for the linker to modify the library code. This code is called location-independent code (PIC).

    1. Pic Data Reference
    2. Pic Function call
13. Tools for processing target files

"In-depth understanding of computer Systems" chapter seventh study notes (first draft)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.