C language Preprocessing compilation link four stages

Source: Internet
Author: User

C Programs (source code) are converted into programs that can be run on hardware (executable code) and need to be compiled and linked.

Compilation process

The compilation process can be divided into two phases: compile and assemble.

Compile

Compile is to read the source program (character stream), the lexical and grammatical analysis, the high-level language instruction into the functional equivalent assembly code, the source file compilation process contains two main stages:

The first phase is the preprocessing phase, which precedes the formal compilation phase. The preprocessing phase modifies the contents of the source file based on the preprocessing directives that have been placed in the file. If the # include directive is a preprocessing instruction, it adds the contents of the head file to the. cpp file. This way of modifying the source file before compiling provides a lot of flexibility to accommodate the limitations of different computer and operating system environments. The code required for one environment may differ from the code required for another environment because the available hardware or operating system is different. In many cases, you can put code for different environments in the same file, and then modify the code in the preprocessing phase to adapt it to the current environment.

Mainly in the following aspects of treatment:

(1) macro definition directives, such as #define a B

For this pseudo-directive, the precompilation is to replace all a in the program with B, but a as a string constant is not replaced. Also #undef, the definition of a macro is canceled, so that subsequent occurrences of the string are no longer replaced.

(2) Conditional compilation instructions, such as #ifdef, #ifndef, #else, #elif, #endif等.

The introduction of these pseudo-directives allows programmers to define different macros to determine which code is processed by the compiler. The precompiled program will filter out unnecessary code based on the relevant files.

(3) header file contains directives, such as # include "FileName" or # include.

In a header file, a large number of macros (the most common character constants) are defined with a pseudo-directive # define, along with declarations of various external symbols. The main purpose of the header file is to make certain definitions available to a number of different C source programs. Because in a C source program that needs to use these definitions, you can simply add an # include statement instead of repeating the definitions in this file again. The precompiled program adds all the definitions in the head file to the output file it produces for processing by the compiler. Header files that are included in the C source program can be system-supplied, and these header files are typically placed in the/usr/include directory. In the program # include them to use angle brackets (< >). In addition, developers can also define their own header files, these files are generally in the same directory as the C source program, in this case, in the # include with double quotation marks ("").

(4) Special symbols, pre-compiled program can recognize some special symbols.

For example, the line identifier that appears in the source program is interpreted as the current row number (decimal number), and file is interpreted as the name of the currently compiled C source program. The precompiled program will replace the strings that appear in the source program with the appropriate values.

What the precompiled program accomplishes is basically an "override" of the source program. In this substitution, an output file with no macro definition, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. Next, the output file will be translated into machine instructions as the output of the compiler.

The second phase of the compilation, optimization phase, the precompiled output file, only constants, such as numbers, strings, variable definitions, and C language keywords, such as main,if,else,for,while,{,}, +,-, *,\, and so on.

The compiler has to work through lexical analysis and parsing, after confirming that all instructions conform to grammatical rules, translate them into equivalent intermediate code representation or assembly code.

Optimization is a very difficult technology in the compilation system. It concerns not only the compiler technology itself, but also the hardware environment of the machine has a great relationship. Optimization is part of the optimization of the intermediate code. This optimization is not dependent on the specific computer. The other optimization is mainly for the generation of target code.

For the former optimization, the main work is to delete the common expressions, loop optimization (out-of-code, strength weakening, transformation loop control conditions, known amount of consolidation, etc.), replication propagation, and deletion of useless assignments, and so on.

The latter type of optimization is closely related to the hardware structure of the machine, and the most important consideration is how to make full use of the values of the variables stored in each hardware register of the machine to reduce the number of accesses to memory. In addition, how to carry out instructions according to the characteristics of the machine hardware (such as pipelining, RISC, CISC, VLIW, etc.) and some of the instructions to make the target code is relatively short, the efficiency of execution is relatively high, is also an important research topic.

Assembly

A compilation actually refers to the process of translating assembly language code into a target machine instruction. For each C language source process that is processed by the translation system, it will eventually get the corresponding target file through this processing. A machine language code that is stored in the target file, which is the target equivalent to the source program. The destination file consists of segments. Typically there are at least two segments in a target file:

Code snippet: This paragraph contains mainly the instructions of the program. The paragraph is generally readable and executable, but is generally not writable.

Data segment: A variety of global variables or static data to be used in the main storage program. General data segments are readable, writable, and executable.

There are three main types of target files in the UNIX environment:

(1) relocatable files

It contains code and data that is appropriate for other target file links to create an executable or shared target file.

(2) Shared target file

This file holds code and data that are appropriate for linking in both contexts. The first is that the linker can work with other relocatable files and shared target files to create another target file, and the second is that the dynamic linker combines it with another executable file and other shared target files to create a process image.

(3) Executable file

It contains a file that can be executed by the operating system to create a process. The assembler is actually generating the first type of target file. For the latter two also need some other processing side can get, this is the work of the link program.

Link process

The target files generated by the assembler are not immediately executed, and there may be many unresolved issues.

For example, a function in one source file might refer to a symbol defined in another source file (such as a variable or function call), a function in a library file might be called in a program, and so on. All of these problems need to be resolved by the process of the linked program.

The main task of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all these target files become a unified whole that can be executed by the operating system.

Depending on how the developer assigns the same library functions, link processing can be divided into two types:

(1) Static link

In this way, the code of the function is copied from its location in the static link library to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library.

(2) Dynamic Link

In this way, the code for the function is placed in a target file called a dynamic-link library or a shared object. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program.

For function calls in an executable file, you can use either dynamic or static linking methods, respectively. Using dynamic linking can make the final executable shorter and save some memory when the shared object is used by multiple processes because only one copy of the code for this shared object is stored in memory. However, it is not necessarily better to use dynamic links than to use static links. In some cases, dynamic linking can cause some performance damage.

The GCC compiler that we use in Linux is to bind the above several processes, so that the user can only use one command to complete the compilation work, which really facilitates the compilation work, but for beginners to understand the compilation process is very unfavorable, is the GCC agent's compilation process:

From what you can see:

Pre-compilation

Convert. c files to. i files

The GCC command used is: gcc–e

Corresponds to pre-processing command CPP

Compile

Convert. c/.h files to. s files

The GCC command used is: Gcc–s

Corresponds to the compile command cc–s

Assembly

Convert. s files to. o Files

The GCC command used is: Gcc–c

Corresponds to assembly command is as

Link

Convert. o files into executable programs

The GCC command used is: GCC

Corresponds to the link command is LD

Summarize the compilation process on the top four processes: precompilation, compiling, compiling, linking. Lia understands that the work done in these four processes is helpful for us to understand the process of working with header files, libraries, and so on, and to understand clearly that the process of compiling links is also helpful for us to locate errors during programming and to try to get the compiler's detection errors as much as possible when programming.

C language Preprocessing compilation link four stages

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.