A deep understanding of computer systems-Chapter 7

Source: Internet
Author: User

A link is a process that collects various code and data parts and combines them into a single file. This file can be loaded (or copied) to the memory and executed. The link can be executed during compilation, that is, when the source code is translated into machine code, or when the program is loaded into the memory and executed; it is even executed by the application at runtime. In early computer systems, links were manually executed. In modern systems, links are automatically executed by the linker.

7.1 compiler driver

Most compilation systems provide compilation drivers that call language preprocessors, compilers, compilers, and connectors as needed.

7.2 static Link

A static linker such as a Unix LD program uses a set of relocated target files and command line parameters as input to generate a fully linked executable target file that can be loaded and run as output. The input relocated target file consists of various codes and data sections. Command in one section, the initialized global variables are in another section, and the uninitialized variables are in another section.

To construct an executable file, the linker must complete two main tasks:

* Parses the definition and reference symbols of the target file. The purpose of symbolic Parsing is to associate each symbolic reference with a symbolic definition.

* The relocation compiler and assembler generate the ELE. Me code and data section starting from address 0. The linker Associates each symbol definition with a memory location, and then modifies all references to these symbols so that they point to this memory location to relocate these segments.


Some basic facts about the linker: the target file is a collection of bytes. Some of these blocks contain program code, some contain program data, and others include the data structure that guides the linker and loader. The linker connects these blocks, determines the runtime location of the connected blocks, and modifies the Code and various locations in the data blocks. The linker and assembler have completed most of the work.

7.3 target file

The compiler and assembler generate relocable target files (including shared target files ). The linker generates an executable target file. Technically, a target module is a byte sequence, and a target file is a target module stored in a disk file.

7.4 relocated target files

The format of a typical elf relocated target file is p451. The elf header starts with a 16-byte sequence, which describes the size and byte sequence of the words in the system that generate the file. The rest of the elf header contains information that helps the linker analyze and interpret the target file. This includes the size of the elf header, the type of the target file (such as relocable, executable, or shared), The Machine Type (such as ia32), and the file offset of the node header table, and the size and quantity of entries in the section header table. The positions and sizes of different sections are described in the section header table. Each section in the target file has a fixed size entry.

The table between the elf header and the node header is borrowed. A typical elf relocated target file contains the following sections:

. TextMachine code of compiled programs

. RodataRead-only data

. DataThe initialized global C variable. Local C variables are stored in the stack at runtime, neither in. Data nor in. BSS.

. BassUninitialized global C variable. In the target file, this section does not occupy the actual space. It is just a placeholder. The format of the target file distinguishes between initialization and uninitialized variables for space efficiency: In the target file, uninitialized variables do not need to occupy any actual disk space.

. SymtabA symbol table that stores information about functions and global variables defined and referenced in programs. Each relocated target file has a symbol table in. symtab.

. Rel. TextA list of locations in A. text section. You need to modify these locations when the linker combines the target file with other files. Generally, any command to call an external function or reference a global variable must be modified. On the other hand, commands that call local functions do not need to be modified. Note: you do not need to relocate the information in the executable target file. Therefore, it is generally omitted unless the user displays the instructions that the linker contains this information.

. Rel. DataRelocation information of any global variables referenced or defined by the module. Generally, any initialized global variable must be modified if its initial value is a global variable address or an external definition function address.

. DebugA debugging symbol table contains the local variables and Type Definitions of the general program definition, the global variables defined and referenced in the program, and the original C source files.

. LineThe ing between the line number in the original C source file and the machine commands in the. text section.

 

. Strtab is a string table whose contents include the symbol table in. symtab and. debug sections, and the section name in the section header.

7.6 symbol Parsing

7.6.1 how does the linker parse the global symbols of multiple definitions?

During compilation, the compiler outputs each global symbol to the assembler, which is either strong or weak. The Compiler implicitly encodes this information in the symbol table of the target file that can be relocated. Strong symbol for functions and initialized global variables. uninitialized global variables are weak symbols.

According to the definition of strong and weak symbols, the Unix linker uses the following rules to process multiple defined symbols:

Rule 1: Multiple strong symbols are not allowed.

Rule 2: if there is a strong symbol and multiple weak symbols, select a strong symbol.

Rule 3: If multiple weak symbols exist, select any one of these weak symbols.

 

7.6.2 link to static library

In Unix systems, static libraries are stored on disks in a special file format called archive. An archive file is a set of connected relocated target files. It has a header to describe the size and location of each member's target file. The archive file name is identified by the suffix..

7.6.3 how does the linker use a static library to parse and reference p460?

In the symbolic parsing phase, the linker scans the target and archive files from left to right in the same order they appear on the compiler driver command line. In this scan, the linker maintains a set E that can be relocated to the target file (files in this set will be merged to form executable files ), A collection of unparsed symbols (that is, referenced but not yet defined symbols) U, and a collection of symbols defined in the previous input file D. Initially, E, U, and D are empty.

* For each input file F on the command line, the linker determines whether F is a target file or an archive file. If F is a target file, the linker adds F to E, modifies U and D to reflect the symbol definition and reference in F, and continues to the next input file.

* If F is an archive file, the linker tries to match the unparsed symbols in U and the symbols defined by the archive file members. If an archive file member M defines a symbol to parse a reference in U, then M is added to E, the linker modifies U and D to reflect the symbol definition and reference in M. This process is repeatedly performed on all member target files in the archive file until both U and d do not change. At this time, any target file not included in E is simply discarded, and the linker will continue to process the next input file.

* If u is not empty after the linker completes scanning the input file on the command line, the linker will output an error and terminate it. Otherwise, it will merge and relocate the target file in e to build the output executable file.

This algorithm may cause some confusing Link errors, because the order of the library and target files on the command line is very important. In the command line, if a library that defines a symbol appears before the target file that references the symbol, the reference cannot be parsed and the link fails. The general principle about libraries is to place them at the end of the command line.

On the other hand, if the libraries are not independent of each other, they must be sorted to enable the symbol S referenced externally by the members of each archive file, at least one definition of S in the command line is actually referenced by S.

To meet the dependency requirements, you can repeat the database on the command line.

7.7 relocation

Once the linker completes the symbolic parsing step, it associates each symbolic reference in the Code with a fixed symbolic definition (that is, a symbolic table entry in one of its input target modules. At this time, the linker knows the exact size of the Code section and data section in the input target module. Now you can start to relocate. In this step, the input module is merged and the runtime address is assigned to each symbol.Relocation consists of two steps:

* Relocation section and symbol definition. In this step, the linker combines all the sections of the same type into new aggregate sections of the same type. Then, the linker assigns the runtime memory address to the new aggregation section, each section defined by the input module, and each symbol defined by the input module. When this step is completed, each instruction and global variable in the program has a unique runtime memory address.

* Symbol reference in the relocation section. In this step, the linker modifies the references to each symbol in the Code and data sections so that they point to the correct runtime address. To perform this step, the linker depends onRelocation entryCan be relocated to the data structure in the target module.

7.7.1 relocation entries

When the assembler generates a target module, it does not know where the data and code are ultimately stored in the memory. It does not know the location of any externally defined function or global variable referenced by this module. Therefore, the assembler generatesRelocation entryTells the linker how to modify the reference when merging the target file into an executable file. Place the code relocation entries in. Rel. Text. Relocation entries of initialized data are placed in. Rel. Data.

Elf defines 11 different relocation types. We only care about two of the most basic relocation types:

* R_386_pc32 relocate a reference using a 32-bit PC relative address.

* R_1__32 relocate a reference using a 32-Bit Absolute address.

7.7.2 relocation symbol reference p462

 

7.8 executable target file p465

The format of the executable target file is similar to that of the relocated target file. The overall format of the elf header description file. It also includes the entry point of the program, that is, the address of the first command to be executed when the program is running .. The text,. rodata, And. Data sections are similar to those in the relocated target file, except that these sections have been relocated to their final runtime memory address .. The init section defines a small function called _ init. The initialization code of the program calls it. Because the executable file is completely linked (has been relocated), it no longer needs the. rel section.

Elf executable files are designed to be easily loaded into memory, and contiguous fragments of executable files are mapped to contiguous memory segments. The section header table describes the ing relationship.

7.9 load the executable target file

To run the executable target file P, enter its name in the command line of the UNIX shell:

UNIX>./P

Because p is not a built-in shell command, the shell considers P as an executable target file. by calling a loader that resides in the memory) to run it. Any UNIX program can call the loader by calling the execve function.The loader copies the code and data in the executable target file from the disk to the memory, and then runs the program by redirecting to the first instruction or entry point of the program. The process of copying a program to the memory and running it is called loading.

Figure 7-13 memory image p466 during Linux Runtime

Each UNIX program has a runtime memory image. For example, in a 32-bit Linux system, the code segment always starts from the address (0x8048000. The data segment is in the next 4 kb aligned address. The first 4 kb alignment address after the read/write segment of the runtime heap, And the malloc library is called by child workers to grow upwards. Another segment is reserved for the shared library. The user stack always starts from the largest valid user address and grows down (to the low storage location ). The segments starting from the upper part of the stack are reserved for the code and data of the memory where the Operating System resides (that is, the kernel.

Under the guidance of the middle header table of an executable file, the loader copies the relevant content of the executable file to the code and data segment. Next, the loader jumps to the entry point of the program, that is, the symbol _ start address. The startup code at the _ start address is in the target fileCTRL. oAll C Programs are the same. After the initialization routine is called from the. Text and. init sections, start the code to call the atexti routine. This program attaches a series of programs that should be called when the application is suspended normally. The exit function runs the function registered with atexit, and then calls _ exit to return the control to the operating system. Then, start the code to call the main program of the application, and it will start to execute our C code. After the application returns, start the Code call _ exit program, which will control the return to the operating system.

7.10 dynamic link shared library

Shared libraries are a modern innovation product dedicated to solving static library defects. A shared library is a target module that can be loaded to any memory address during runtime and linked with a program in the memory. This process is calledDynamic Link,It is executed by a program called dynamic linker. A Shared Library is also known as a shared object, which is usually indicated by the. So Suffix in UNIX systems.

Figure 7-15 use a shared library to dynamically link p468

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.