Process Analysis of the C/C ++ program from compilation to the final generation of executable files

Source: Internet
Author: User

Reproduced http://apps.hi.baidu.com/share/detail/32660500

How to generate executable files in C/C ++ program compilation steps

**************************************** *********************** **************************************** *****************************

An electronic computer uses a binary number consisting of "0" and "1". Binary is the basis of a computer's language. At the beginning of the invention of the computer, people had to bow down to the secret and use the language of the computer to command the computer to do this. In a word, it is to write a string of commands consisting of "0" and "1" and submit them to the computer for execution. This language is a machine language. Imagine the Old Man counting a hole in front of the punching machine. He whispered, and your fright may have caused them to miss a hole. the result may be that a ship was flying off the orbit.

To alleviate the pain of programming with machine language, people have made a beneficial improvement: replacing the binary string of a specific instruction with some Concise English letters and symbol strings, for example, "a d" represents addition, "m o V" represents data transmission, and so on. As a result, it is easy for people to understand and understand what the program is doing, and it is convenient to correct and maintain the program, this programming language is called assembly language, that is, the second generation computer language. However, computers do not recognize these symbols. This requires a special program to translate these symbols into a machine language with binary numbers, which is called an assembler. Because there is a one-to-one correspondence between Assembly commands and machine languages, this is much easier than English-Chinese or English-Chinese.

Advanced languages are intended for human beings. They are designed according to the way they think. Machines are so strange and unknown. The story of fish and bear's paw happened in computer languages. Therefore, there must be a bridge to connect the two. Building a bridge is not a simple task. The more convenient you want, the more complicated the bridge is. Then how did advanced languages become machine languages? This process made me come slowly.

Compile: the process of converting source code to machine-readable code. Compile the program to read the source program (ghost stream), perform lexical and syntax analysis on it, convert advanced language commands into functional equivalent assembly code, and then convert the assembler program into a machine language, and generate executable programs based on the requirements of the operating system for the executable file format.

C source program-> compile preprocessing-> compile-> optimize Program-> Assembler-> link Program-> Executable File

1. Compile the pre-processing to read the C source program and process the pseudo commands (commands starting with #) and special symbols.

Pseudoinstructions mainly include the following four aspects:

(1) macro definition commands, such as # define name tokenstring and # UNDEF. For the previous pseudo command, the pre-compilation must replace all the names in the program with tokenstring, but the name as a String constant is not replaced. For the latter, the definition of a macro will be canceled, so that the appearance of the string will not be replaced in the future.

(2) Conditional compilation commands, such as # ifdef, # ifndef, # else, # Elif, # endif, and so on. The introduction of these pseudo commands allows programmers to define different macros to determine which code the program will process. The pre-compiled program filters out unnecessary code based on the relevant files.

(3) the header file contains commands, such as # include "FILENAME" or # include <FILENAME>. In header files, a large number of macros (the most common is a character constant) are defined using a pseudo command # define, which also contains declarations of various external symbols. The purpose of using header files is to make some definitions available for multiple different C source programs. In the C source program that needs to use these definitions, you only need to add a # include statement, instead of repeating these definitions in this file. The precompiled program adds all the definitions in the header file to the output file generated by the precompiled program for the Compilation Program to process it.

The header files contained in the C source program can be provided by the system. These header files are generally stored in the/usr/include directory. # Include them in the program using angle brackets (<> ). In addition, developers can also define their own header files. These files are generally placed in the same directory as the C source program. In this case, double quotation marks ("") are used in # include ("").

(4) special symbols. Pre-compiled programs can recognize some special symbols. For example, the line mark in the source program will be interpreted as the current line number (in decimal number), and the file will be interpreted as the name of the currently compiled C source program. The pre-compiled program replaces these strings with appropriate values in the source program.

The pre-compiled program basically replaces the source program. After this replacement, an output file without macro definition, Conditional compilation instructions, and special symbols is generated. The meaning of this file is the same as that of the source file without preprocessing, but the content is different. Next, the output file will be translated into machine commands as the output of the Compilation Program.

2. compilation phase

Only constants exist in the pre-compiled output file. Such as numbers, strings, variable definitions, and keywords in C language, such as main, if, else, for, while, {,}, +,-, *, \, and so on. The pre-compiled program performs lexical analysis and syntax analysis. After confirming that all commands comply with the syntax rules, it is translated into equivalent intermediate code representation or assembly code.

3. Optimization stage

Optimization processing is a difficult technology in the compilation system. It involves not only the compilation technology itself, but also the hardware environment of the machine. The optimization part is the optimization of the intermediate code. This optimization does not depend on a specific computer. Another optimization is mainly for generating the target code. We put the optimization stage behind the Compilation Program, which is a general representation.

For the previous optimization, the main work is to delete public expressions, loop optimization (out-of-code optimization, weak strength, changing cycle control conditions, merging of known quantities, etc.), and re-write propagation, and the deletion of useless values.

The Optimization of the latter type is closely related to the hardware structure of the machine. The main consideration is how to make full use of the values of relevant variables stored in each hardware register of the machine, to reduce the memory access times. In addition, how to make some adjustments to commands based on the features of machine hardware execution commands (such as pipelines, Proteus, CISC, and VLIW) to make the target code relatively short and the execution efficiency relatively high, it is also an important research topic.

The compiled code after optimization must be compiled by the assembler and converted into corresponding machine commands, which may be executed by machines.

4. Assembly Process

The assembly process refers to the process of translating the assembly language code into the target machine instructions. For each C language source program processed by the translation system, the corresponding target file will be obtained after this processing. What is stored in the target file is the machine language code equivalent to the source program.

The target file consists of segments. Generally, a target file contains at least two segments:

The section in the code segment contains the program instructions. This section is generally readable and executable, but generally cannot be written.

The data segment mainly stores various global variables or static data used in the program. Generally, data segments are readable, writable, and executable.

There are three types of target files in UNIX:

(1) relocated files include code and data suitable for other target file links to create an executable or shared target file.

(2) shared target files store the code and data suitable for linking in two contexts. First, the linking program can process it with other relocated files and shared target files to create another target file; the second is that the dynamic link Program combines it with another executable file and other shared target files to create a process image.

(3) an executable file contains a file that can be executed by a process created by the operating system.

The assembler generates the first type of target file. For the last two methods, some other processing is required. This is the work of The Link program.

5. Link Program

The target file generated by the assembler cannot be executed immediately. There may be many unsolved problems. For example, a function in a source file may reference a symbol (such as a variable or function call) defined in another source file, or call a function in a library file in a program. All these problems must be handled by the linked program.

The main task of linking a program is to connect the target file to each other, or connect the symbols referenced in one file with the definition of the symbol in another file, this makes all these target files a unified whole that can be loaded and executed by the operating system.

Link processing can be divided into two types based on the connection methods specified by developers for functions of the same Library:

(1) Static links in this way, the code of the function will be copied from the static Link Library in its location to the final executable program. In this way, when the program is executed, the code will be loaded into the virtual address space of the process. The static Link Library is actually a collection of target files. Each file contains the code of one or more related functions in the library.

(2) In this way, the code of a function is put in a target file called a dynamic link library or shared object. What the linked program does at this time is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable file is executed, all content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program finds the corresponding function code based on the information recorded in the executable program.

For function calls in executable files, dynamic or static links can be used respectively. Dynamic Links can make the final executable files relatively short, and save some memory when the shared object is used by multiple processes, because only one copy of the shared object code needs to be saved in the memory. However, dynamic links are superior to static links. In some cases, dynamic links may cause some performance damage.

After the above five processes, the C source program is eventually converted into an executable file.

The previous section describes the types of programming languages, including machine language, assembly language, and advanced language.

This article is from: I love R & D network (52rd.com)-R & D Base Camp
Detailed Source: http://www.52rd.com/Blog/Archive_Thread.asp? SID = 1, 5196

* *********************************** Article 2 * **************************************** **************************************** ****************************************

C/C ++ program compilation steps
Source: chinaunix blog date: (0 comments in total) I want to comment
 


C/C ++ program compilation steps
Many people are familiar with the C/C ++ language. This is basically a programming language that every college student must learn. It is usually used as a programming language for Programming beginners, most of the courses are arranged in freshman year. When I first went to college, the children were still very good at learning, and they studied hard. Therefore, the C/C ++ language has a good grasp, not to mention compiling a program, that is, writing a program of hundreds of lines is not enough, but do they really know the C/C ++ program compilation steps?
I think many people are not very clear about it. If he has learned the "Compilation Principle", he may be able to give a rough picture. The "comfortable" development environment of VC shields a lot of compilation details, which undoubtedly lowers the entry barrier for beginners, but it also denies them the right to "know why, as a result, many things can only be memorized. If you encounter any related problems, you will be "stunned ". In fact, I learned how C/C ++ source code is converted into executable files step by step during programming in Linux.
In general, the C/C ++ source code can be converted into executable files on the corresponding platform only after four steps: preprocessing, compilation, assembly, and connection. Most of the time, the programmer can complete the above four steps through a single command. For example, the following section C's "Hello world !" Code:
File: HW. c
# Include stdio. h>
Int main (INT argc, char * argv [])
{
Printf ("Hello world! \ N ");
Return 0;
}
If GCC is used for compilation, only one command is required to generate the executable file HW:
Xiaosuo @ gentux HW $ gcc-o hw. c
Xiaosuo @ gentux HW $./HW Hello world!
We can use the-V parameter to see what action GCC has done in the backend:
Reading specs from/usr/lib/GCC/i686-pc-linux-gnu/3.4.6/specs
Configured: /var/tmp/portage/sys-devel/gcc-3.4.6-r2/work/gcc-3.4.6/configure -- prefix =/usr -- bindir =/usr/i686-pc-linux-gnu/GCC-bin/3.4.6 -- export dedir =/ usr/lib/GCC/i686-pc-linux-gnu/3.4.6/include -- datadir =/usr/share/GCC-data/i686-pc-linux-gnu/3.4.6 -- Mandir =/usr/share/GCC-data/i686-pc-linux-gnu/ 3.4.6/man
-- Infodir =/usr/share/GCC-data/i686-pc-linux-gnu/3.4.6/info -- With-gxx-include-Dir =/usr/lib/GCC/i686-pc-linux-gnu/3.4.6/include/g +-V3 -- Host = i686-pc-linux-gnu -- Build = i686-pc-linux-gnu -- disable-altivec -- enable-NLS -- without-receivded-gettext -- With-system-zlib
-- Disable-checking -- disable-werror -- enable-secureplt -- disable-libunwind-exceptions -- disable-multilib -- disable-libgcj -- enable-classes ages = C, C ++, f77 -- enable-shared -- enable-threads = POSIX -- enable-_ cxa_atexit -- enable-clocale = GNU
Thread model: POSIX
GCC version 3.4.6 (Gentoo 3.4.6-R2, ssp-3.4.6-1.0, pie-8.7.10)
/Usr/libexec/GCC/i686-pc-linux-gnu/3.4.6/PC3-quiet-v hw. c-quiet-dumpbase HW. c-mtune = pentiumpro-auxbase HW-version-O/tmp/ccyb6uwr. s
Ignoring nonexistent directory "/usr/local/include"
Ignoring nonexistent directory "/usr/lib/GCC/i686-pc-linux-gnu/3.4.6/.../i686-pc-linux-gnu/include"
# Include "..." search starts here:
# Include...> search starts here:
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/include
/Usr/include
End of search list.
Gnu c version 3.4.6 (Gentoo 3.4.6-R2, ssp-3.4.6-1.0, pie-8.7.10) (i686-pc-linux-gnu)
Compiled by gnu c version 3.4.6 (Gentoo 3.4.6-R2, ssp-3.4.6-1.0, pie-8.7.9 ).
GGC Heuristics: -- Param GGC-Min-expand = 81 -- Param GGC-Min-heapsize = 97004
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6 /.. /.. /.. /.. i686-pc-linux-gnu/bin/AS-v-QY-O/tmp/ccq8uged. o/tmp/ccyb6uwr. s
GNU faster er version 2.17 (i686-pc-linux-gnu) using BFD version 2.17
/Usr/libexec/GCC/i686-pc-linux-gnu/3.4.6/collect2 -- Eh-frame-HDR-M elf_i386-dynamic-linker/lib/ld-linux.so.2-o hw/usr/lib/GCC/i686-pc-linux-gnu /3.4.6 /.. /.. /.. /crt1.o/usr/lib/GCC/i686-pc-linux-gnu/3.4.6 /.. /.. /.. /crti. o/usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtbegin. O
-L/usr/lib/GCC/i686-pc-linux-gnu/3.4.6-L/usr/lib/GCC/i686-pc-linux-gnu/3.4.6-L/usr/lib/GCC/i686-pc-linux-gnu/3.4.6 /.. /.. /.. /.. /i686-pc-linux-gnu/lib-L/usr/lib/GCC/i686-pc-linux-gnu/3.4.6 /.. /.. /.. /tmp/ccq8uged. o-lgcc -- as-needed-lgcc_s -- no-as-needed
-LC-lgcc -- as-needed-lgcc_s -- no-as-needed/usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtend. o/usr/lib/GCC/i686-pc-linux-gnu/3.4.6 /.. /.. /.. /crtn. O
After some redundant information is removed:
PC3 HW. C-o/tmp/ccyb6uwr. s
As-O/tmp/ccq8uged. O/tmp/ccyb6uwr. s
LD-o hw/tmp/ccq8uged. o
The preceding three commands correspond to preprocessing + compilation, assembly, and connection in the compilation step respectively. Pre-processing and compilation are carried out in a command (cc0). You can split it into the following two steps:
CPP-o hw. I HW. c
PC3 HW. I-o/tmp/ccyb6uwr. s
A streamlined makefile that can compile the above HW. c file is as follows:
. Phony: clean
ALL: HW
HW: HW. o
LD-dynamic-linker/lib/ld-linux.so.2-o hw/usr/lib/crt1.o \
/Usr/lib/crti. O \
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtbegin. O \
HW. O-LC \
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtend. O \
/Usr/lib/crtn. o
HW. O: HW. s
As-o hw. o hw. s
HW. S: HW. I
/Usr/libexec/GCC/i686-pc-linux-gnu/3.4.6/PC3-o hw. s HW. c
HW. I: HW. c
CPP-o hw. I HW. c
Clean:
Rm-rf hw. I HW. s HW. o
Of course, some paths in the makefile above are specific on my system, and you may be different from me.
Next we will follow the compilation order to see what the compiler has done in each step.
The first is preprocessing. The pre-processed file HW. I:
#1 "HW. c"
#1 ""
#1 ""
...
_ Extension _ typedef _ quad_t _ off64_t;
_ Extension _ typedef int _ pid_t;
_ Extension _ typedef struct {int _ Val [2];} _ fsid_t;
...
Extern int remove (_ const char * _ filename) _ attribute _ (_ nothrow __));
Extern int Rename (_ const char * _ old, _ const char * _ new) _ attribute _ (_ nothrow __));
...
Int main (INT argc, char * argv [])
{
Printf ("Hello world! \ N ");
Return 0;
}
Note: because the file size is large, only a few representative contents are left.
We can see that the pre-processor adds the content of all the files (including recursive files) to the original C source file, and then outputs the content to the output file, in addition, it expands all macro definitions, so you cannot find any Macros in the output file of the Preprocessor. This also provides a simple way to view macro expansion results.
Step 2: Compile the C/C ++ code into assembly code:
. File "HW. c"
. Section. rodata
. Lc0:
. String "Hello world! \ N"
. Text
. Globl main
. Type main, @ Function
Main:
Pushl % EBP
Movl % ESP, % EBP
Subl $8, % ESP
Andl $-16, % ESP
Movl $0, % eax
Addl $15, % eax
Addl $15, % eax
Shrl $4, % eax
Sall $4, % eax
Subl % eax, % ESP
Subl $12, % ESP
Pushl $. lc0
Call printf
Addl $16, % ESP
Movl $0, % eax
Leave
RET
. Size main,.-Main
. Section. Note. GNU-stack, "", @ progbits
. Ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-R2, ssp-3.4.6-1.0, pie-8.7.10 )"
This Assembly file is much smaller than the pre-processed C/C ++ file, removing a lot of unnecessary things, such as useless type declarations and function declarations.
The third step is "assembly", which translates the assembly code output in the second step into machine code in a certain format, which is generally displayed as the elf target file in Linux.
Xiaosuo @ gentux HW $ file HW. o
HW. O: Elf 32-bit LSB relocatable, Intel 80386, Version 1 (sysv), not stripped
In the last step, connect the target file generated in the previous step with the target file and library file of the system library, and finally generate an executable file that can run on a specific platform. Why do we need to connect some target files (crt1.o, crti. O, etc.) in the system library? These target files are used to initialize or recycle the C runtime environment, such as the initialization of the heap memory allocation context environment. In fact, CRT is also the abbreviation of C runtime. This also implies that the program is not executed from the main function, but from an entry in the CRT. on Linux, this entry is _ start. The above makefile generates a dynamic connection executable file. To generate a static connection executable file, you must modify the corresponding segment in the makefile:
HW: HW. o
LD-M elf_i386-static-o hw/usr/lib/crt1.o \
/Usr/lib/crti. O \
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtbegint. O \
-L/usr/lib/GCC/i686-pc-linux-gnu/3.4.6 \
-L/usr/i686-pc-linux-gnu/Lib \
-L/usr/lib /\
HW. O -- start-group-lgcc-lgcc_eh-LC -- end-group \
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/crtend. O \
/Usr/lib/GCC/i686-pc-linux-gnu/3.4.6/.../crtn. o
So far, an executable file is created. In normal projects, the compilation process is not so detailed. The first three steps are generally integrated and shown in makefile as follows:
HW. O: HW. c
Gcc-o hw. O-c hw. c
In fact, if you make any changes to HW. C, the first three steps are unavoidable in most cases. Therefore, writing them together does not cause any harm. On the contrary, you can use the -- pipe parameter to tell the compiler to use pipelines to replace temporary files, thus improving compilation efficiency.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.