Seventh reading notes "in-depth understanding of computer systems"

Source: Internet
Author: User
Tags table definition

7.1 Compiler Driver

1.大部分编译系统提供编译驱动程序:代表用户在需要时调用语言预处理器、编译器、汇编器和链接器。

    (1)C预处理器:源程序main.c->ASCII码中间文件main.i     (2)C编译器:main.i->ASCII码汇编语言文件main.s     (3)C汇编器:main.s->可重定位目标文件 2.运行链接器程序ld,将各种.o文件以及必要的系统目标文件组合起来,创建可执行文件。 3.运行可执行文件:./可执行文件名字 4.外壳调用操作系统中加载器函数,拷贝可执行文件中的代码和数据到存储器,将控制转移到这个程序的开头7.2 Static Links

Static linker such as the UNIX LD program takes a set of relocatable target files and command-line arguments as input, generating a fully-linked executable target file that can be loaded and run as output. The relocatable destination file that you enter is made up of a variety of different code and data sections. Directives in one section, initialized global variables in another section, and uninitialized variables in another section.

In order to construct an executable file, the linker must complete two main tasks :

    • The symbol resolves the target file definition and reference symbol. The purpose of symbolic parsing is to associate each symbol reference with exactly one symbol definition.
    • The relocation compiler and assembler generate hungry code and data sections starting at address 0. The linker repositions These sections by linking each symbol definition to a memory location and then modifying all references to those symbols so that they point to the memory location.

Some basic facts about the linker: The destination file is purely a collection of byte blocks. Some of these blocks contain program code, some contain program data, and others contain data structures that guide the linker and the loader. The linker connects the blocks, determines the run-time position of the connected block, and modifies various locations in the code and data blocks. The linker and assembler have done most of the work.

The destination file is purely a fast-byte collection. Some of these blocks contain program code, others contain program data, while others include data structures that guide the linker and the loader. The linker links the blocks, determines the run-time position of the connected block, and modifies various locations in the code and data blocks. The linker knows very little about the target machine. The compiler and assembler that produced the target file have done most of the work.

7.3 Target File
    • The compiler and assembler generate a relocatable target file (including a shared destination file). The linker generates executable target files. Technically, a target module is a byte sequence, and a target file is a target module that is stored in a disk file.
    • The compiler and assembler generate a redefinition target file (including a shared destination file). The linker generates executable target files.
    • The destination file formats are not the same between each system.
7.4 to relocate the target file

A typical elf can relocate the format of the target file P451. The Elf header (elf header) begins with a 16-byte sequence that describes the size and byte order of the word of the system that generated the file. The remainder of the ELF header contains information that helps the linker parse and interpret the target file. These include the size of the elf head, the type of the target file (such as relocatable, executable, or shared), machine type (such as IA32), file offset of the section Header table, and the size and number of entries in the section Header table. The location and size of the different sections are described by the section Header table, where each section in the destination file has a fixed-size entry.

Both the Elf Head and the section Head table are borrowed. A typical elf relocatable target file contains the following sections:

  • . text The machine code of the compiled program
  • . Rodata read-only data
  • . Data has been initialized for global C variables. Local c variables are stored in the stack at run time, and are not present in the. Data section, nor in the. BSS section.
  • . Bass uninitialized global C variable. In the target file, this section does not occupy the actual space, it is just a placeholder. The destination file format distinguishes between initialization and uninitialized variables for space efficiency: in the target file, uninitialized variables do not need to occupy any actual disk space.
  • . symtab a symbol table that holds information about functions and global variables that are defined and referenced in the program. Each relocatable destination file has a symbol table in. Symtab.
  • . rel.text A. The list of locations in the text section, which needs to be modified when the linker is combining this target file with other files. In general, any instruction that calls an external function or references a global variable needs to be modified. On the other hand, directives that invoke local functions do not need to be modified. Note that it is not necessary to relocate the information in the executable target file, so it is usually omitted unless the user displays the section indicating that the linker contains this information.
  • . Rel.data the relocation information for any global variables referenced by or defined by the module. Generally, any initialized global variable needs to be modified if its initial value is a global variable address or an address of an externally defined function.
  • . debug a Debug symbol table whose entries are local variables and type definitions that are always defined by the program, global variables defined and referenced in the program, and the original C source file.
  • The line number in the original C source file and the mapping between the machine directives in the. Text section.
  • . strtab A string table whose contents include the symbol table in the. Symtab and. Debug sections, and the section name in the section header.
7.5 Symbols and Symbols table

In the context of the linker, there are three different symbols:

    • Global symbols defined by M and can be referenced by other modules
    • Global symbols defined by other modules and referenced by the module M
    • Local symbols referenced only by module M
7.6 Symbolic resolution 7.6.1 how the linker resolves global symbols for multiple definitions

In the compilation, the compiler outputs each global symbol to the assembler, either strong or weak, and the assembler implicitly encodes the information in the symbol table of the relocatable target file. The function and the initialized global variable are strongly signed, and the uninitialized global variable is a weak symbol.

Based on the definition of strong and weak symbols, the UNIX linker uses the following rules to handle symbols for multiple definitions:

    • Rule 1: Multiple strong symbols are not allowed.
    • Rule 2: If you have a strong symbol and multiple weak symbols, select the strong symbol.
    • Rule 3: If there are multiple weak symbols, select one of these weak symbols.
7.6.2 and Static Library links

In UNIX systems, the static library is in a special file format called Archive village where on disk. An archive file is a set of connected, relocatable target files that have a header that describes the size and location of each member's target file. The archive file name is identified by the suffix. A.

7.6.3 how the linker uses static libraries to parse references

During the symbolic parsing phase, the linker scans the relocatable destination and archive files in the same order that they appear on the compiler driver command line, from left to right. In this scan, the linker maintains a set of relocatable target files E (the files in this collection are merged to form an executable file), an unresolved symbol (that is, a symbol that references but not yet defined), and a set of symbols defined in the previous input file, D. Initially, E, U, and D are empty.

    1. For each input file f on the command line, the linker determines whether F is a target file or an archive file. If f is a target file, then the linker f is added to E, modify U and D to reflect the symbol definitions and references in F, and continue with the next input file.
    2. If f is an archive file, the linker tries to match the unresolved symbols in U and the symbols defined by the archive file members. If an archive member, M, defines a symbol to resolve a reference in U, the M is added to E, and the linker modifies U and D to reflect the symbol definitions and references in M. This process is repeated for all the member target files in the archive file until both U and D are no longer changed. At this point, any target file that is not contained in E is simply discarded, and the linker continues to process the next input file.
    3. If u is non-null after the linker finishes scanning the input file on the command line, the linker is good at outputting an error and terminating it. Otherwise, it merges and repositions the destination file in E to build the output executable file.
    • This algorithm causes some disturbing link-time errors, because the order of the libraries and destination files on the command line is very important. On the command line, if the library that defines a symbol appears before the target file referencing the symbol, then the reference cannot be parsed and the link fails. The general guideline for libraries is to put them at the end of the command line.
    • On the other hand, if the libraries are not independent of each other, then they must be sorted so that the symbol s, which is referenced externally to each member of the archive file, has at least one of the s's definitions in the command line after the reference to S.

If you need to meet dependent requirements, you can repeat the library on the command line.

7.7 Relocation

Once the linker has completed the symbolic parsing step, it links each symbol in the code to a symbolic definition (that is, a symbol table entry in one of its input target modules). At this point, the linker knows the exact size of the Code section and Data section in its input target module. You are now ready to reposition, and in this step you will merge the input modules and assign a run-time address to each symbol.

Relocation is made up of two steps:

    1. Reposition sections and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregation section of the same type. The linker then assigns the run-time memory address to the new aggregation section, assigns each section defined by the input module, and assigns each symbol to the input module definition. When this step is complete, each instruction and global variable in the program has a unique runtime memory address.
    2. The symbol reference in the relocation section. In this step, the linker modifies the reference to each symbol in the Code section and data section so that they point to the correct run-time address. To perform this step, the linker relies on the data structure in the Relocatable target module called the relocation entry .
7.7.1 Relocation Entries

When the assembler generates a target module, it does not know where the data and code are ultimately stored in memory. It also does not know the location of any externally defined functions or global variables referenced by this module. Therefore, whenever the assembler encounters a target reference to the final position location, it generates a relocation entry that tells the linker how to modify the reference when it merges the target file into an executable file.  The relocation entry for the code is placed in the. Rel.text. The relocation entry for the initialized data is placed in the. Rel.data.

The ELF defines 11 different relocation types. We only care about two of the most basic relocation types:

    • R_386_PC32 relocate a reference to a relative address using a 32-bit PC.
    • R_386_32 relocate a reference that uses a 32-bit absolute address.
7.8 executable target file

The format of the executable destination file is similar to the format of the relocatable destination file. The overall format of the ELF header description file. It also includes the entry point of the program, which is the address of the first instruction to be executed when the program is run: the text,. Rodata, and. Data sections and the sections in the Relocatable destination file are similar except that these sections have been relocated beyond their final run-time memory addresses: The init section defines a small function called _init, which is called by the program's initialization code. Because the executable file is fully linked (it has been relocated), it no longer requires the. Rel section.

Elf executables are designed to be easily loaded into memory, and successive slices of an executable file are mapped to contiguous memory segments. The section Header table describes this mapping relationship.

7.9 Loading executable target files

each UNIX program has a run-time memory image. For example: In a 32-bit Linux system, the code snippet always starts at the address (0x8048000). The data segment is located at the next 4KB aligned address. The runtime heap grows on the next first 4KB aligned address after the read/write segment, and child labor calls the malloc library upwards. There is also a segment that is reserved for shared libraries. The user stack always starts with the largest legitimate user address and grows downward (to the low memory place). The segment that starts at the top of the stack is reserved for code and data for the part of the operating system that resides in the memory (that is, the kernel).

Guided by the Header table in the middle of the executable file, the loader copies the relevant contents of the executable file to the code and data segments. Next, the loader jumps to the entry point of the program, which is the address of the symbol _start. The startup code at the _start address is defined in the target file ctrl.o and is the same for all C programs. After the initialization routines are called from the. Text and. init sections, the startup code calls the Atexti routine, which attaches a series of programs that should be called when the application is aborted properly. The Exit function runs the atexit registered function and then returns control to the operating system by calling _exit. Next, the startup code calls the application's main program and it starts executing our C code. After the application returns, the startup code calls the _EXIT program, which returns control to the operating system.

Workflow loaded:

Each program in a UNIX system runs in a process context and has its own virtual address space. When a shell runs a program, the parent shell process generates a child process, which is a replica of the parent process. The child process invokes the bootloader through the EXECVE system. The loader removes the existing virtual memory segments of the child process and creates a new set of code, data, heap and stack segments, new stacks, and heap segments that are initialized to zero. By mapping the pages in the virtual address space to the page size of the executable, the new code and data segments are initialized to the contents of the executable file. Finally, the loader jumps to the _start address, which eventually invokes the application's main function. In addition to some header information, there is no copy of the data from disk to memory during the loading process. The operating system uses its page scheduling mechanism to automatically transfer pages from disk to storage until the CPU is applied to a mapped virtual page for copying.

7.10 Dynamic Link Shared library

Shared library is a modern innovation product that devotes to solving the defect of static library. A shared library is a target module that, at run time, can be loaded into any memory address and added to a program in memory to link it up. This process , called dynamic linking, is performed by a program called a dynamic linker. Shared libraries are also known as shared destinations, which are typically represented by the. So suffix in Unix systems.

7.11 Tools for working with target files
    • AR: Create a static library, insert, delete, list, and extract members.
    • STRINGS: Lists all printable strings in a destination file.
    • STRIP: Removes symbol table information from the destination file.
    • NM: Lists symbols for the symbol table definition in a target file.
    • Size: Lists the name and size of the section in the destination file.
    • Readelf: Ability to display all information for a target file.
    • OBJDUMP: Disassembly
    • LDD: Lists the shared libraries that are required for an executable file to run.

Seventh reading notes "in-depth understanding of computer systems"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.