Csapp Reading Notes (5)-connection

Source: Internet
Author: User

This connection refers to the connection process after the program is compiled. According to the book, understanding the process of converting program source code into binary executable files helps programmers build large-scale projects. From my own point of view, understanding these is a further understanding of the program, and it is also part of the continuation of the compilation principle. This chapter is called a connection. In fact, it has read the entire compilation connection process and a preliminary understanding of how the program runs. Not much nonsense. Start. (The examples used in this chapter are all in UNIX operating systems.) We know that for C ++, the process of source code conversion is generally as follows: pre-compilation, compilation, and connection. In the first step, replace various pre-compiled headers with actual code or numeric values. For example, if you use # include <stdio. h>, the stdio. h file content is replaced with the currently referenced source code. Step 2: Use the result generated in step 1 as the input. Use the compiler to generate a relocated target file in units of a single source file. This file is already in binary format, but it still cannot run directly. Step 3: connect. This step connects the scattered relocated target files in the project to the same file. Of course, some special processing procedures are required (described later ), then add the header information required by the executable program. In these three steps, a real executable program is generated. Step 1: first understand the format of the target file that can be relocated and the format of the executable target file on the UNIX platform. As shown in: (A) formats of relocated target files (B) executable target file formatFrom the two images (a) (B), we can see that the two formats are roughly the same and both have. text. rodata. data. BSS. symtab. debug. line. strtab segment, elf header, and node header table. The difference is that (a) has more. rel. text and. rel. rel is the abbreviation of "data". It indicates that the two sections contain information related to "relocation", that is, some information about external data reference or function reference, it will be used during connection. In (B), the. init section is added, which is a piece of code that runs during program initialization. It generally includes some loading programs and main portals. The usage of each section is described as follows:. Text: the code part of the program. rodata: the read-only data, such as the constant string. Data: The initialized global variable, occupies a certain amount of disk space. For example, int A [100000] = {0};. BSS: uninitialized global variables, which do not occupy disk space. For example, int A [1000000];. symtab: symbol table, which stores the definition and reference of functions or global variables. rel. text :. function references in text that need to determine the reference location during connection. rel. data: global variable reference that needs to be determined during connection. debug: Creates the debug version, including local variables, source code, and other information. line: created in debug, which records the ing between generated code and source code. strtab: storage. symtab and. the ing relationships of the strings in debug in the virtual space of processes are shown in. Basically, the available addresses in the user space occupy 75% of the total range, 0xc0000000 ~ 0xffffffff is the kernel space, and user code in the kernel space is not allowed to be accessed. This is the reason why we sometimes encounter the pointer operation 0xc0000005 error when running the program. Note that 0x08048000 is read-only, including. text. init. rodata is read and written later. data. the BSS global variables are the heap space, and the stack space increases from 0xc0000000-1. In addition, 0x40000000 is the address space loaded by the shared library (that is, the dynamic library. It can be seen that the heap space and stack space are separated by shared libraries, so it is wrong to say that the stack will exhaust the heap space from this layer. (In general, stack and heap are born, that is, the leader. This is because other parts of the program are fixed during runtime, and only the stack changes dynamically. Because the physical memory is generally smaller than the virtual memory, the stack space will compete with each other.) Step 2, start viewing the connection process. There are two types of connections: static connection and dynamic connection. First, let's look at static connections. Static connections are used to copy all the. Text sections useful in the target file that can be relocated to the generated executable file. The final running program does not depend on any other files. The basic algorithm for static connection is described as follows: three sets are defined, and E indicates that the data is waiting for merging. o file or. in. o, A indicates an undefined symbol, and D indicates a defined symbol. When the algorithm starts, E, U, and D are empty. For each file to be connected, if it is. o file, put it into E, parse the defined symbols into D, undefined symbols into U; If yes. database a files are traversed sequentially. O module, if. a. the O module can parse the symbols in U, add them to E, and update U and D. When E, U, and d do not change, it ends. When all the files to be connected are processed and the U is not blank, the information is printed and a connection error is reported. The above is the basic process of static connection. Let's look at the dynamic connection. Dynamic connection: dynamically loads the required library files during running. During the compilation of the connection, only the entry function is connected. In this way, the program runs on other shared library files. The connection algorithm is basically the same as the static connection, but the details are not the same. By comparing static and dynamic data, we can find that the static connection does not depend on the database file at runtime, and the database function access speed is fast. However, for multiple processes on the same machine that use the same database, A large number of repeated code segments exist in the memory, resulting in a waste of memory space. The library functions used for dynamic connection loading at runtime are slower than the static access speed (in fact, the first time the file is loaded into the memory, it is not much slower ), for multiple processes on the same machine that use the same database, only one code segment exists in the memory. The implementation of this technology relies on the management of virtual storage. Because the ing of virtual storage to object storage, the sharing pages of shared databases of different processes can be mapped to the same physical page, to achieve the purpose of sharing. In general, this chapter not only gives me an understanding of the basic process of connection, but also gives me a preliminary understanding of some basic principles of process running.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.