Detailed procedures for C + + source code to executable code

Source: Internet
Author: User

Compile, compile the program to read the source program (character stream), the lexical and grammatical analysis, the high-level language instruction into the functional equivalent of the assembly code, and then by the assembler to machine language, and according to the operating system to the executable file format of the requirements of the chain to be executed program.

Source code----preprocessing------------------

source--(Compilation)--assembly--(compilation)-->obj--(link)-->pe/elf



1. Compilation preprocessing (preprocessing)

Read the source program, processing the pseudo-directives (instructions beginning with #) and special symbols

The following commands are typically used for preprocessing:

GCC-E Hello.c-o hello.i

The parameter-e means that preprocessing is done only or can be completed using the following directives

CPP hello.c > HELLO.I/* cpp–the C Preprocessor */

Direct Cat hello.i You can see the preprocessed code

The

[analysis] pseudo-directive mainly consists of the following four aspects
(1) macro definition directives, such as # define Name tokenstring, #undef等. For the previous pseudo-directive, the precompilation is to replace all the name in the program with Tokenstring, but the name as a string constant is not replaced. For the latter, the definition of a macro is canceled so that subsequent occurrences of the string are no longer replaced.
(2) conditional compilation directives, such as #ifdef, #ifndef, #else, #elif, #endif, and so on. The introduction of these pseudo-directives allows programmers to define different macros to determine which code is processed by the compiler. The precompiled program will filter out unnecessary code from the
(3) header file containing instructions, such as # include "FileName" or # include <FileName>, according to the relevant files. In a header file, a large number of macros (the most common character constants) are defined with a pseudo-directive # define, along with declarations of various external symbols. The main purpose of the header file is to make certain definitions available to a number of different C source programs. Because in a C source program that needs to use these definitions, you can simply add an # include statement instead of repeating the definitions in this file again. The precompiled program adds all the definitions in the head file to the output file it produces for processing by the compiler. The
header files that are included in the C source program can be system-supplied, and these header files are typically placed in the/usr/include directory. In the program # include them to use angle brackets (<>). In addition, developers can also define their own header files, these files are generally in the same directory as the C source program, in this case, in the # include with double quotation marks (""). The
(4) special symbol, the precompiled program can recognize some special symbols. For example, the line identifier that appears in the source program is interpreted as the current row number (decimal number), and file is interpreted as the name of the currently compiled C source program. The precompiled program will replace the strings that appear in the source program with the appropriate values. The
    preprocessor completes essentially an "override" of the source program. In this substitution, an output file with no macro definition, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. Next, the output file will be translated into machine instructions as the output of the compiler.


2. Compile phase (Compilation)

The process of compiling is a series of lexical analysis, grammatical analysis, semantic analysis and optimization of the pre-processed files into corresponding assembly codes.

$GCC –S Hello.i–o Hello.s

Or

$/usr/lib/gcc/i486-linux-gnu/4.4/cc1 hello.c

Note: The current version of GCC synthesizes the two steps of preprocessing and compiling in a single step, which is done with the CC1 tool. GCC is actually some of the background program packaging, according to different parameters to invoke other actual processing programs, such as: Precompiled compiler cc1, assembler as, connector L

The compiler is compiled at compile time in a C + + file, and your project will not compile if there are no C + + files in the project. Only constants will be available in the precompiled output file. such as numbers, strings, definitions of variables, and C-language keywords, such as main,if,else,for,while,{,},+,-, *,\, and so on. The compiler has to work through lexical analysis and parsing, after confirming that all instructions conform to grammatical rules, translate them into equivalent intermediate code representation or assembly code.


3. Optimization phase
Optimization is a very difficult technology in the compilation system. It concerns not only the compiler technology itself, but also the hardware environment of the machine has a great relationship. Optimization is part of the optimization of the intermediate code. This optimization is not dependent on the specific computer. The other optimization is mainly for the generation of target code. , we put the optimization stage behind the compiler, which is a more general representation.

For the former optimization, the main work is to delete the common expressions, loop optimization (out-of-code, strength weakening, transformation loop control conditions, known amount of consolidation, etc.), replication propagation, and deletion of useless assignments, and so on.

The latter type of optimization is closely related to the hardware structure of the machine, and the most important consideration is how to make full use of the values of the variables stored in each hardware register of the machine to reduce the number of accesses to memory. In addition, how to carry out instructions according to the characteristics of the machine hardware (such as pipelining, RISC, CISC, VLIW, etc.) and some of the instructions to make the target code is relatively short, the efficiency of execution is relatively high, is also an important research topic.

The optimized assembly code must be translated into the appropriate machine instructions by the assembler assembly, which may be executed by the machine.

4. Assembly Process (Assembly)

The assembler process actually refers to the process of translating assembly language code into a target machine instruction. For each C language source process that is processed by the translation system, it will eventually get the corresponding target file through this processing. A machine language code that is stored in the target file, which is the target equivalent to the source program.

$ gcc–c Hello.c–o hello.o

Or

$ as Hello.s–o hello.co The destination file consists of segments. Typically there are at least two segments in a target file:


The code snippet contains the main instructions for the program. The paragraph is generally readable and executable, but is generally not writable.

The data segment mainly stores various global variables or static data to be used in the program. General data segments are readable, writable, and executable.

There are three main types of target files in the UNIX environment:

(1) relocatable files contain code and data that are appropriate for other destination file links to create an executable or shared destination file.

(2) Shared destination file This file holds the code and data that are appropriate for linking in both contexts. The first thing the linker can do with other relocatable files and shared target files to create another target file; the second is that the dynamic linker combines it with another executable file and other shared target files to create a process image.

(3) executable file It contains a file that can be executed by the operating system to create a process.

The assembler is actually generating the first type of target file. For the latter two also need some other processing side can get, this is the work of the link program.

5. Link program (linking)

The target files generated by the assembler are not immediately executed, and there may be many unresolved issues. For example, a function in one source file might refer to a symbol defined in another source file (such as a variable or function call), a function in a library file might be called in a program, and so on. All of these problems need to be resolved by the process of the linked program.

The linker ld is called to link a large stack of target files needed for the program to run, as well as other library files that are dependent on it, and finally the executable file is generated.

Ld-static crt1.o crti.o crtbegint.o hello.o-start-group-lgcc-lgcc_eh-lc-end-group crtend.o crtn.o (the path name of the file is omitted).

The main task of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all these target files become a unified whole that can be executed by the operating system.

Depending on how the developer assigns the same library functions, link processing can be divided into two types:

(1) Static links in this way, the code of the function will be copied from its location in the static link library to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library. (Personal note: Static link copies the code of the linked library into an executable program, making the executable program larger)

(2) dynamic linking in this way, the code of the function is placed in a target file called a dynamic link library or a shared object. Chain refers to the link stage only to add some descriptive information, and the program executes the corresponding dynamic library from the system to load into memory. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program. (Personal note: Dynamic linking refers to the code that needs to be linked to a shared object, the shared object is mapped to the process virtual address space, and the linker records the code information that the executable program needs to use in the future, and quickly locates the corresponding code snippet based on that information.) )

For function calls in an executable file, you can use either dynamic or static linking methods, respectively. Using dynamic linking can make the final executable shorter and save some memory when the shared object is used by multiple processes because only one copy of the code for this shared object is stored in memory. However, it is not necessarily better to use dynamic links than to use static links. In some cases, dynamic linking can cause some performance damage.


After these five processes, the C + + source program is eventually converted into an executable file. By default, the name of this executable file is named A.out.

Transferred from: http://blog.csdn.net/yxc135/article/details/7564060

Detailed procedures for C + + source code to executable code

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.