Understanding Linux's compiling system from Hello world
This article will be a simple Hello World program by writing to run the entire process to explain the Linux compiler system principle.
First, write a program in C language hello.c
The contents are as follows: [CPP] view plain copy #include <stdio.h> int main () {printf ("Hello world\n"); }
After saving, compile with GCC:
Gcc-o Hello hello.c
Then execute the compiled Hello program:
./hello or Hello
The results of the implementation are as follows:
Hello World
The life cycle of the Hello program starts with a high-level C language program, because it can be read and understood. However, in order to run the HELLO.C program on the system, each C statement must be translated into a series of low-level machine language instructions by other programs. These instructions are then packed in a format known as the executable target program and stored in the form of a binary disk file. The target program is also known as an executable target file.
On UNIX systems, the transformation from source files to destination files is done by the compiler driver:
unix> Gcc-o Hello hello.c
Here, the GCC command calls multiple programs in turn: the preprocessor--> compiler (which, of course, the GCC compiler)--> the assembler--> linker), and translates the source program file hello.c into an executable target file hello. The translation process can be completed in four phases, as shown in the following figure. Programs that perform these four phases (preprocessor, compiler, assembler, and linker) compose the compilation system (compilation Systems).
Note: The GCC command is to invoke multiple programs, rather than just the GCC compilers, and the GCC compiler is just part of compiling the system. Detailed reference to the following figure:
Compiling system
• Pretreatment phase. The preprocessor (CPP) modifies the original C program based on commands that begin with the character #. For example, the #include <stdio.h> command in line 1th of HELLO.C tells the preprocessor to read the contents of the system header file stdio.h and insert it directly into the program text. The result is another C program, usually with a. I as a file extension
Exhibition name.
• Compile phase. The compiler (CC1) translates the text file hello.i into a text file Hello.s, which contains an assembly language program. Each statement in the assembly language program exactly describes a low-level machine language instruction in a standard text format. Assembly language is very useful because it provides a pass for different compilers in different high-level languages
The output language used. For example, the C compiler and the Fortran compiler produce output files in the same assembly language.
• assembly phase. Next, the assembler (AS) translates the HELLO.S into machine language instructions, packs them into a format called a relocatable target program (relocatable object programs), and saves the results in the target file hello.o. The hello.o file is a binary file whose byte encoding is a machine language instruction rather than a character. If we open the hello.o file in a text editor, we will see a bunch of garbled characters.
• Link phase. Note that the Hello program invokes the printf function, which is a function in the standard C library that each C compiler provides. The printf function exists in a separate precompiled target file named PRINTF.O, and the file must be merged into our HELLO.O program in some way. The linker (LD) is responsible for handling this merge. The result is a hello file, which is an executable object file (or simply an executable file) that can be loaded into memory and executed by the system.
Here, I believe that the Linux compiler has a preliminary understanding of the system, then, why do we have to go into the in-depth understanding of the compiler system, the main reasons are as follows:
• Optimize program performance. Modern compilers are sophisticated tools that typically generate very good code. As programmers, we don't have to understand the internal work of compilers in order to write efficient code. However, in order to make good coding choices in C programs, we do need to know some machine code and how the compiler translates different C statements into machine code. For example, whether a switch statement is always much more efficient than a series of if-then-else statements. How much overhead a function call is. A while loop is more efficient than a for loop. Pointer references are more efficient than array indexes. Why do we put the result of a loop summation into a local variable that runs much faster than it does in a parameter passed by reference? Why do we simply rearrange the parentheses in an arithmetic expression to make a function run faster.
• An error occurred while understanding the link. According to our experience, some of the most disturbing bugs are often associated with linker operations, especially if you are trying to build a large software system. For example, the linker reports that it cannot parse a reference, what that means. What is the difference between a static variable and a global variable. What happens if you define two global variables with the same name in a different C file. What is the difference between a static library and a dynamic library? What is the effect of the order in which we arrange the libraries on the command line. The most serious is why some link errors do not appear until run time.
• Avoid security vulnerabilities. Over the years, buffer overflow errors have been the main cause of security vulnerabilities on most networks and Internet servers. These errors exist because few people understand the importance of limiting the number and format of data they receive from untrusted sites. The first step in learning secure programming is to understand the consequences of how data and control information is stored on the stack. However, these are inseparable from the understanding of the compilation system.