Detailed explanation of C Programming process

Source: Internet
Author: User

In the C language compilation link process, we need to compile and link a C program (source code) compiled by us to a program (executable code) that can be run on hardware. Compiling is the process of translating the source code in text form into a target file in machine language form. Links are the process of organizing the target file, the startup code of the operating system, and the library files used to generate executable code. The process is illustrated as follows:

From the figure, we can see that the entire code compilation process is divided into two processes: Compilation and linking. The compilation process contains the section enclosed in braces, and the rest is the link process.

Compilation process


The compilation process can be divided into two phases: Compilation and compilation.

  (1) Compile

Compiling is to read the source program (ghost stream), perform lexical and syntax analysis on it, convert advanced language commands into functional equivalent assembly code, the source file compilation process contains two main stages:

The first stage isPreprocessing phaseBefore the official compilation phase. In the preprocessing phase, the content of the source file will be modified according to the pre-processing commands that have been placed in the file. For example, the # include command is a preprocessing command that adds the content of the header file to the. cpp file. This method of modifying source files before compilation provides great flexibility to adapt to the constraints of different computer and operating system environments. The Code required by one environment may be different from the Code required by another environment, because the available hardware or operating system is different. In many cases, you can place the code for different environments in the same file, and then modify the code in the preprocessing phase to adapt it to the current environment.

It mainly deals with the following aspects:

(1) macro definition commands, such as # define a B

For this pseudo-instruction, the pre-compilation must replace all A in the program with B, but a as a String constant is not replaced. In addition, # UNDEF will cancel the definition of a macro so that the appearance of the string will not be replaced in the future.

(2) Conditional compilation commands, such as # ifdef, # ifndef, # else, # Elif, # endif.

The introduction of these pseudo commands allows programmers to define different macros to determine which code the program will process. The pre-compiled program filters out unnecessary code based on the relevant files.

(3) the header file contains commands, such as # include "FILENAME" or # include <FILENAME>.

In header files, a large number of macros (the most common is a character constant) are defined using a pseudo command # define, which also contains declarations of various external symbols. The purpose of using header files is to make some definitions available for multiple different C source programs. In the C source program that needs to use these definitions, you only need to add a # include statement, instead of repeating these definitions in this file. The precompiled program adds all the definitions in the header file to the output file generated by the precompiled program for the Compilation Program to process it. Header files contained in the C source program can be provided by the system. These header files are generally placed in
/Usr/include directory. # Include them in the program using angle brackets (<> ). In addition, developers can also define their own header files. These files are generally placed in the same directory as the C source program. In this case, double quotation marks ("") are used in # include ("").

(4) special symbols. Pre-compiled programs can recognize some special symbols.

For example, the line mark in the source program will be interpreted as the current line number (in decimal number), and the file will be interpreted as the name of the currently compiled C source program. The pre-compiled program replaces these strings with appropriate values in the source program.

The pre-compiled program basically replaces the source program. After this replacement, an output file without macro definition, Conditional compilation instructions, and special symbols is generated. The meaning of this file is the same as that of the source file without preprocessing, but the content is different. Next, the output file will be translated into machine commands as the output of the Compilation Program.

Stage 2Compilation and OptimizationIn the pre-compiled output file, only constants exist. For example, definitions of numbers, strings, variables, and keywords in C language, such as main, if, else, for, while, {,}, +,-, *, \, and so on.

To compile a program, perform lexical analysis and syntax analysis. After confirming that all commands comply with the syntax rules, translate them into equivalent intermediate code representation or assembly code.

Optimization processing is a difficult technology in the compilation system. It involves not only the compilation technology itself, but also the hardware environment of the machine. The optimization part is the optimization of the intermediate code. This optimization does not depend on a specific computer. Another optimization is mainly for generating the target code.

For the previous optimization, the main work is to delete public expressions, loop optimization (out-of-code optimization, weak strength, changing cycle control conditions, merging of known quantities, etc.), and re-write propagation, and the deletion of useless values.

The Optimization of the latter type is closely related to the hardware structure of the machine. The main consideration is how to make full use of the values of relevant variables stored in each hardware register of the machine, to reduce the memory access times. In addition, how to make some adjustments to commands based on the features of machine hardware execution commands (such as pipelines, Proteus, CISC, and VLIW) to make the target code relatively short and the execution efficiency relatively high, it is also an important research topic.

  (2) Compilation

Compilation actually refers to the process of translating assembly language code into target machine commands. For each C language source program processed by the translation system, the corresponding target file will be obtained after this processing. What is stored in the target file is the machine language code equivalent to the source program. The target file consists of segments. Generally, a target file contains at least two segments:

Code segment: This section mainly contains program instructions. This section is generally readable and executable, but generally cannot be written.

Data Segment: It mainly stores various global variables or static data used in the program. Generally, data segments are readable, writable, and executable.

There are three types of target files in UNIX:

(1) relocated files

It contains code and data suitable for creating an executable or shared target file by linking to other target files.

(2) shared target file

This type of file stores the code and data suitable for linking in two contexts. The first is that the linking program can process it with other relocated files and shared target files to create another target file; the second is that the dynamic link Program combines it with another executable file and other shared target files to create a process image.

(3) executable files

It contains a file that can be executed by a process created by the operating system. The assembler generates the first type of target file. For the last two methods, some other processing is required. This is the work of The Link program.

  

The above is the entire compilation process, and the next step is the link stage.

Link Process

The target file generated by the assembler cannot be executed immediately. There may be many unsolved problems.

For example, a function in a source file may reference a symbol (such as a variable or function call) defined in another source file, or call a function in a library file in a program. All these problems must be handled by the linked program.

The main task of linking a program is to connect the target file to each other, or connect the symbols referenced in one file with the definition of the symbol in another file, this makes all these target files a unified whole that can be loaded and executed by the operating system.

Link processing can be divided into two types based on the connection methods specified by developers for functions of the same Library:

(1) Static Link

In this connection mode, the code of the function will be copied from its static Link Library to the final executable program. In this way, when the program is executed, the code will be loaded into the virtual address space of the process. The static Link Library is actually a collection of target files. Each file contains the code of one or more related functions in the library.

(2) Dynamic Link

In this way, the function code is put in a target file called a dynamic link library or shared object. What the linked program does at this time is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable file is executed, all content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program finds the corresponding function code based on the information recorded in the executable program.

For function calls in executable files, dynamic or static links can be used respectively. Dynamic Links can make the final executable files relatively short, and save some memory when the shared object is used by multiple processes, because only one copy of the shared object code needs to be saved in the memory. However, dynamic links are superior to static links. In some cases, dynamic links may cause some performance damage.

The GCC compiler we use in Linux is to bind the above processes so that users can use only one command to complete the compilation, which is indeed convenient for compilation, but it is very unfavorable for beginners to understand the compilation process. It is the compilation process of the GCC agent:

We can see that:

Pre-compile

Convert the. c file into a. I file

The GCC command used is: gcc-e

Corresponding to the pre-processing command CPP

Compile

Convert the. c/. h file to the. s file.

The GCC command used is: gcc-S

Corresponding to the compilation command CC-S

Assembly

Convert the. s file to a. o file

The GCC command used is: gcc-C

The Assembly command is

Link

Convert. O files into executable programs

The GCC command used is: gcc

The link command is lD.

In summary, the compilation process involves the preceding four processes: Pre-compilation, compilation, assembly, and link. Lia understands the work done in these four processes and helps us understand the working processes of header files and libraries, A clear understanding of the compilation link process also helps us locate errors during programming and mobilize the compiler to detect errors as much as possible during programming.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.