In order to alleviate the pain of using machine language programming, a useful improvement has been made: to replace a binary string of a particular instruction with a simple English letter, a symbol string, for example, "A D d" for addition, "M O V" for data transmission, etc., so that it is easy to read and understand what the program is doing, Error correction and maintenance have become more convenient, this programming language is called Assembler, that is, the second generation of computer language. However, computers do not recognize these symbols, which requires a specialized program that is specifically responsible for translating these symbols into binary number machine languages, which are called assembler programs. Because there are one by one correspondence between the assembly instruction and the machine language, it is much simpler than English or Chinese translation.
High-level language is biased toward people, in accordance with the design of the way people think, the machine on these but inexplicable wonderful, do not know the so-called. The story of the Fish and Bear paw took place in the computer language. So there must be a bridge to connect the two, building a bridge is not a simple thing. The more you want to be convenient, the more complex the bridge will be. How does the high-level language become machine language, and this process let me slowly.
Compile: The process of converting the source code into a machine-readable code. The compiler reads the source program (character stream), performs lexical and syntactic analysis, translates the high-level language instruction into functionally equivalent assembly code, and then translates the assembler into machine language, and the executable program is delivered as a function of the operating system's requirements for the file format.
C source program, compile-and-preprocessing, compile-and-build, assembler, linker, executable file
1. Compile pre-processing read C source program, the pseudo-instruction in it (beginning with # instructions) and special symbols to deal with.
The pseudo-directive mainly includes the following four aspects
(1) macro definition directives, such as # define Name tokenstring, #undef等. For the previous pseudo-directive, the precompilation is to replace all the name in the program with Tokenstring, but the name as a string constant is not replaced. For the latter, the definition of a macro is canceled so that subsequent occurrences of the string are no longer replaced.
(2) Conditional compilation directives, such as #ifdef, #ifndef, #else, #elif, #endif, and so on. The introduction of these pseudo-directives allows programmers to define different macros to determine which code is processed by the compiler. The precompiled program will filter out unnecessary code based on the relevant files.
(3) header file contains directives, such as # include "FileName" or # include <FileName> etc. In a header file, a large number of macros (the most common character constants) are defined with a pseudo-directive # define, along with declarations of various external symbols. The main purpose of the header file is to make certain definitions available to a number of different C source programs. Because in a C source program that needs to use these definitions, you can simply add an # include statement instead of repeating the definitions in this file again. The precompiled program adds all the definitions in the head file to the output file it produces for processing by the compiler.
Header files that are included in the C source program can be system-supplied, and these header files are typically placed in the/usr/include directory. In the program # include them to use angle brackets (<>). In addition, developers can also define their own header files, these files are generally in the same directory as the C source program, in this case, in the # include with double quotation marks ("").
(4) Special symbols, pre-compiled program can recognize some special symbols. For example, the line identifier that appears in the source program is interpreted as the current row number (decimal number), and file is interpreted as the name of the currently compiled C source program. The precompiled program will replace the strings that appear in the source program with the appropriate values.
What the precompiled program accomplishes is basically an "override" of the source program. In this substitution, an output file with no macro definition, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. Next, the output file will be translated into machine instructions as the output of the compiler.
2. Compile Stage
Only constants will be available in the precompiled output file. such as numbers, strings, definitions of variables, and C-language keywords, such as main,if,else,for,while,{,},+,-, *,\, and so on. The pre-compiler works by lexical analysis and parsing, after confirming that all instructions conform to grammatical rules, translate them into equivalent intermediate code representation or assembly code.
3. Optimization phase
Optimization is a very difficult technology in the compilation system. It concerns not only the compiler technology itself, but also the hardware environment of the machine has a great relationship. Optimization is part of the optimization of the intermediate code. This optimization is not dependent on the specific computer. The other optimization is mainly for the generation of target code. , we put the optimization stage behind the compiler, which is a more general representation.
For the former optimization, the main work is to delete the common expressions, loop optimization (out-of-code, strength weakening, transformation loop control conditions, known amount of consolidation, etc.), replication propagation, and deletion of useless assignments, and so on.
The latter type of optimization is closely related to the hardware structure of the machine, and the most important consideration is how to make full use of the values of the variables stored in each hardware register of the machine to reduce the number of accesses to memory. In addition, how to carry out instructions according to the characteristics of the machine hardware (such as pipelining, RISC, CISC, VLIW, etc.) and some of the instructions to make the target code is relatively short, the efficiency of execution is relatively high, is also an important research topic.
The optimized assembly code must be translated into the appropriate machine instructions by the assembler assembly, which may be executed by the machine.
4. Compilation process
The assembler process actually refers to the process of translating assembly language code into a target machine instruction. For each C language source process that is processed by the translation system, it will eventually get the corresponding target file through this processing. A machine language code that is stored in the target file, which is the target equivalent to the source program.
The destination file consists of segments. Typically there are at least two segments in a target file:
The code snippet contains the main instructions for the program. The paragraph is generally readable and executable, but is generally not writable.
The data segment mainly stores various global variables or static data to be used in the program. General data segments are readable, writable, and executable.
There are three main types of target files in the UNIX environment:
(1) relocatable files contain code and data that are appropriate for other destination file links to create an executable or shared destination file.
(2) Shared destination file This file holds the code and data that are appropriate for linking in both contexts. The first thing the linker can do with other relocatable files and shared target files to create another target file; the second is that the dynamic linker combines it with another executable file and other shared target files to create a process image.
(3) executable file It contains a file that can be executed by the operating system to create a process.
The assembler is actually generating the first type of target file. For the latter two also need some other processing side can get, this is the work of the link program.
5. Linking programs
The target files generated by the assembler are not immediately executed, and there may be many unresolved issues. For example, a function in one source file might refer to a symbol defined in another source file (such as a variable or function call), a function in a library file might be called in a program, and so on. All of these problems need to be resolved by the process of the linked program.
The main work of the linker is to connect the relevant target files to each other, and the symbols referenced in one file are connected to the definition of the symbol in another file, so that all of these target files become a unified whole that can be loaded into the operating system.
Depending on how the developer assigns the same library functions, link processing can be divided into two types:
(1) Static links in this way, the code of the function will be copied from its location in the static link library to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library.
(2) dynamic linking in this way, the code of the function is placed in a target file called a dynamic link library or a shared object. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable program. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program.
For function calls in an executable file, you can use either dynamic or static linking methods, respectively. Using dynamic linking can make the final executable shorter and save some memory when the shared object is used by multiple processes because only one copy of the code for this shared object is stored in memory. However, it is not necessarily better to use dynamic links than to use static links. In some cases, dynamic linking can cause some performance damage.
After the five processes above, the C source program is eventually converted into an executable file.
In the previous section we introduced the types of programming languages, including machine languages, assembly language, and high-level languages.
Article Two ************************************ **********************************************************
A detailed description of the C + + program compilation |
|
|
|
A detailed description of the C + + program compilation C + + language A lot of people are familiar with, this is basically a programming language that every college student must learn, usually also as the programming of the introduction of linguistics, and the curriculum is mostly arranged in freshman year. Just went to college, the children are also very good, study is more serious, intentions. So, C + + language Mastery is also good, needless to say, compile program, is to write a hundreds of line of the program is not a cinch, but they really know C + + program compilation steps? I think a lot of people are not very clear, if he has learned the "principle of compiling", perhaps can say a ballpark. VC "Comfortable" development environment shielded a lot of compilation details, which undoubtedly reduced the beginner's entry threshold, but also "deprived" of their "know why" right, resulting in a lot of things can only be memorized, encountered related problems on the "Zhang Two." In fact, I was also learning the Linux environment in the process of programming only to gradually understand how C + + source code is a step into the executable file. Generally speaking, C + + source code goes through: preprocessing, compiling, assembling and connecting four steps to become executable files under the corresponding platform. Most of the time, the programmer can complete the above four steps with a single command. For example, the following C's "Hello world! Code File:hw.c #include stdio.h> int main (int argc, char *argv[]) { printf ("Hello world!\n"); return 0; } If you compile with GCC, you need only one command to generate the executable HW: [Email protected] HW $ gcc-o HW hw.c [Email protected] HW $./HW Hello world! We can use the-v parameter to see what GCC is doing behind the scenes: Reading Specs From/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/specs Configured with:/var/tmp/portage/sys-devel/gcc-3.4.6-r2/work/gcc-3.4.6/configure--prefix=/usr--bindir=/usr/ i686-pc-linux-gnu/gcc-bin/3.4.6--includedir=/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/include--datadir=/usr/share/ gcc-data/i686-pc-linux-gnu/3.4.6--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.4.6/man--infodir=/usr/share/ Gcc-data/i686-pc-linux-gnu/3.4.6/info--with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/include/g++- V3--host=i686-pc-linux-gnu--build=i686-pc-linux-gnu--disable-altivec--enable-nls--without-included-gettext-- With-system-zlib--disable-checking--disable-werror--enable-secureplt--disable-libunwind-exceptions-- Disable-multilib--disable-libgcj--enable-languages=c,c++,f77--enable-shared--enable-threads=posix--enable-__ Cxa_atexit--enable-clocale=gnu Thread Model:posix GCC version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10) /usr/libexec/gcc/i686-pc-linux-gnu/3.4.6/cc1-quiet-v hw.c-quiet-dumpbase hw.c-mtune=pentiumpro-auxbase hw-version -o/tmp/ccyb6uwr.s Ignoring nonexistent directory "/usr/local/include" Ignoring nonexistent directory "/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /.. /i686-pc-linux-gnu/include " #include "..." Search starts here: #include ...> search starts here: /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/include /usr/include End of search list. GNU C version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10) (I686-PC-LINUX-GNU) Compiled by GNU C version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9). GGC heuristics:--param ggc-min-expand=81--param ggc-min-heapsize=97004 /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /.. /i686-pc-linux-gnu/bin/as-v-qy-o/TMP/CCQ8UGED.O/TMP/CCYB6UWR.S GNU Assembler version 2.17 (I686-PC-LINUX-GNU) using BFD version 2.17 /usr/libexec/gcc/i686-pc-linux-gnu/3.4.6/collect2--eh-frame-hdr-m elf_i386-dynamic-linker/lib/ld-linux.so.2-o HW /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /crt1.o/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /crti.o/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/crtbegin.o-l/usr/lib/gcc/i686-pc-linux-gnu/3.4.6-l/usr/lib/gcc/ i686-pc-linux-gnu/3.4.6-l/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /.. /i686-pc-linux-gnu/lib-l/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /TMP/CCQ8UGED.O-LGCC--as-needed-lgcc_s--NO-AS-NEEDED-LC-LGCC--as-needed-lgcc_s--no-as-needed/usr/lib/gcc/ i686-pc-linux-gnu/3.4.6/crtend.o/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /crtn.o After a bit of tidying up and removing some redundant information, the following: CC1 Hw.c-o/TMP/CCYB6UWR.S As-o/TMP/CCQ8UGED.O/TMP/CCYB6UWR.S Ld-o HW/TMP/CCQ8UGED.O The above three commands correspond to pre-processing + compilation, assembly, and connection in the compilation step, respectively. preprocessing and compiling is done in a command (CC1) and can be split again into the following two steps: Cpp-o hw.i hw.c CC1 Hw.i-o/TMP/CCYB6UWR.S A streamlined makefile can compile the above hw.c files as follows: . Phony:clean All:hw hw:hw.o Ld-dynamic-linker/lib/ld-linux.so.2-o hw/usr/lib/crt1.o \ /USR/LIB/CRTI.O \ /USR/LIB/GCC/I686-PC-LINUX-GNU/3.4.6/CRTBEGIN.O \ HW.O-LC \ /USR/LIB/GCC/I686-PC-LINUX-GNU/3.4.6/CRTEND.O \ /usr/lib/crtn.o Hw.o:hw.s As-o HW.O HW.S Hw.s:hw.i /usr/libexec/gcc/i686-pc-linux-gnu/3.4.6/cc1-o HW.S hw.c Hw.i:hw.c Cpp-o hw.i hw.c Clean RM-RF hw.i hw.s HW.O Of course, some of the paths above makefile are specific to my system, and you may be different from mine. Next we look at the compilation sequence to see what the compiler has done at every step. The first is preprocessing, after preprocessing the file hw.i: # 1 "hw.c" # 1 "" # 1 "" ... __extension__ typedef __quad_t __off64_t; __extension__ typedef int __pid_t; __extension__ typedef struct {int __val[2];} __fsid_t; ... extern int Remove (__const char *__filename) __attribute__ ((__nothrow__)); extern int rename (__const char *__old, __const char *__new) __attribute__ ((__nothrow__)); ... int main (int argc, char *argv[]) { printf ("Hello world!\n"); return 0; } Note: As the file is relatively large, only a small number of representative content is left. You can see that the preprocessor adds the contents of all the files that are included (including the recursively included files) to the original C source file, and then outputs them to the output file, and besides, it expands all the macro definitions, so you won't find any macros in the preprocessor's output file. This also provides an easy way to view the results of a macro expansion. The second step, "compiling", is to "translate" C + + code into assembly code: . File "Hw.c" . section. Rodata . LC0: . String "Hello world!\n" . text . GLOBL Main . type Main, @function Main PUSHL%EBP MOVL%esp,%EBP Subl $8,%esp Andl $-16,%esp MOVL,%eax Addl,%eax Addl,%eax Shrl $4,%eax Sall $4,%eax Subl%eax,%esp SUBL,%esp PUSHL $. LC0 Call printf Addl $16,%esp MOVL,%eax Leave Ret . size main,.-main . section. Note. Gnu-stack, "", @progbits . Ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10)" This assembly file is much smaller than the pre-processed C + + file, removing a lot of unnecessary things, such as useless type declarations and function declarations. The third step, "assembly", translates the assembly code of the second output into a machine code that conforms to a certain format, and generally behaves as an elf target file on Linux. [Email protected] HW $ file HW.O Hw.o:elf 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped The final step, connect, connects the target file of the previous build with the target file and library file of the system library, resulting in an executable file that can be run on a specific platform. Why do you want to connect some target files (crt1.o, CRTI.O, etc.) in the system library? These target files are used to initialize or recycle the C runtime environment, such as the initialization of the heap memory allocation context, in fact, the CRT is the C runtime abbreviation. This implies another point: The program is not executed from the main function, but starts from a portal in the CRT, which is _start on Linux. The above makefile generates a dynamically connected executable file, and if you want to generate a statically connected executable file, you need to modify the corresponding segment in makefile: hw:hw.o Ld-m Elf_i386-static-o hw/usr/lib/crt1.o \ /USR/LIB/CRTI.O \ /USR/LIB/GCC/I686-PC-LINUX-GNU/3.4.6/CRTBEGINT.O \ -l/usr/lib/gcc/i686-pc-linux-gnu/3.4.6 \ -l/usr/i686-pc-linux-gnu/lib \ -l/usr/lib/\ HW.O--START-GROUP-LGCC-LGCC_EH-LC--end-group \ /USR/LIB/GCC/I686-PC-LINUX-GNU/3.4.6/CRTEND.O \ /usr/lib/gcc/i686-pc-linux-gnu/3.4.6/. /.. /.. /crtn.o At this point, an executable file is finally created. The general project does not need to divide the compilation process so fine, the first three steps are generally integrated, in the makefile performance as follows: Hw.o:hw.c Gcc-o hw.o-c hw.c In fact, if any changes are made to HW.C, the first three steps are unavoidable in most cases. So it is not bad to write them together, instead, you can use the--pipe parameter to tell the compiler to replace the temporary file with the pipeline, thereby improving the efficiency of the compilation. |
Compile and run Process Analysis (reprint)