Process from C source code to executable program
From the source code (. c) To the executable file is divided into five steps: 1. Compilation preprocessing 2. compilation stage 3. Optimization Stage 4. assembly process 5. Linking program 1. Compilation preprocessing (. i) ------> "replace" into an output file without macro definition, Conditional compilation instructions, and special symbols. Read the c source program and process the pseudocommands (commands starting with #) and special symbols. Pseudocommands mainly include the following four aspects: (1) macro-defined commands. For example: # define M 9, # define SQUARE (x) * (x), # undef (remove macro definition), macros are all capitalized (differentiate functions ). Note: constants used as strings are not replaced.
#define M 9int main(){ a=M; printf("M=%d",a); //printf("M=%d",9);}#define SQUARE(x) ((x)*(x)) // x * x int main(){ int a=4; printf("%d\n",SQURE(a+1)); //printf("%d\n",((4+1)*(4+1))); //printf("%d\n", 4+1 * 4+1 );}
Note: The left parenthesis of the parameter list must be close to SQUARE and cannot be left-side. Otherwise, the computing logic may occur. (2) Conditional compilation commands. For example: # ifdef, # ifndef, # else, # elif, # endif, and so on. These pseudo commands allow programmers to define different macros to determine which code the program will process (filter), so as to avoid repeated code copying. # Ifdef identifier segment 1 # else Segment 2 # endif its function is to compile segment 1 if the identifier has been defined by the # define command; otherwise, Segment 2 is compiled. (3) the header file contains commands. For example, # include <stdio. h> and # include "stdio. h. # Include: Copies the content of an existing file to the current file. It allows some definitions to be used by multiple different C source programs. You only need to add a # include statement, the pre-compilation will add all the definitions in the header file to the output file generated by the file for compilation, instead of being defined by the programmer. Note: The difference between <> and. <> Indicates that the file is obtained by preprocessing to the path specified by the system, and "" indicates that the file name in the current directory is stdio. h file. If not found, search for other directories Based on the path specified by the system. (4) special symbols. For example, some predefined symbols _ FILE _ (indicating the name of the FILE being compiled) and _ TIME _ (indicating the TIME string at the TIME of compilation), _ LINE _ (indicating the row number of the file being compiled), and _ DATE _ (representing the DATE string at the time of compilation), _ STDC _ (determine whether the file is defined as a standard C program), pre-compiled programs will replace these strings in the source program with appropriate values. 1 printf ("file: % s \ tline: % d \ tdate: % s \ ttime: % s \ n" ,__ FILE _ ,__ LINE __, _ DATE __,__ TIME _); # Operator and # budget operator a. If we want to include macro parameters in the string, we can use "#", it can convert the language symbol into a string.
#include<stdio.h>#define SQR(x) printf("The square of "#x" is %d.\n",((x)*(x)))int main(){ SQR(8);}
The running result is: The square of 8 is 64. B, "#" is a binder that can bond The first and second parts.
#include<stdio.h>#define XNAME(n) x ## nint main(){ XNAME(8) ;}
The running result is: x8 2. During the compilation phase (. s) -------> to translate the code into an equivalent intermediate code representation or compile the code into a pre-compiled output file, only constants are available. For example, the definitions of numbers, strings, variables, and keywords (main, if, else, for, while, {,}, +,-, *, \) in C language. The pre-compiled program performs lexical analysis, semantic analysis, symbol aggregation, and syntax analysis. After confirming that all commands comply with the syntax rules, translate it into equivalent intermediate code representation or assembly code. 3. Optimization stage (improved execution efficiency ). (1) Optimization of intermediate code. (2) It is related to the hardware structure of the machine, such as using the values of relevant variables stored in each hardware register of the machine to reduce the number of accesses to the memory. Iv. assembly process (. o) ------> Generate the corresponding target file. The assembly process is relatively simple for the assembler. The assembler only needs to convert the assembly code into commands that can be executed by machines. Each assembly has a corresponding machine instruction, which has no complex syntax or semantics, there is no need for command optimization. Just compare the comparison table of machine commands and Assembly commands to translate them one by one. The target file stores the machine language code equivalent to the source program. The target file consists of segments. A target file contains at least two segments: the code segment mainly contains program instructions. This section is generally readable, executable, and non-writable. The data segment mainly stores global variables or static data in the program. Generally, data segments are readable, executable, and writable. 5. Link program -------> connect the target files to each other. The target file generated by the assembler cannot be executed immediately. There may be many unsolved problems, such as function calls and inter-module variable access. Links include Address and space allocation, relocation, and symbolic resolution. Relocation: if there is A global variable named var in the target file A, and the var variable is used in the target file B, we compile the target file B, because the compiler cannot find the var address during compilation, the compiler sets the address to 0 if the address cannot be determined. After the linker links A and B, the address of the variable var is determined, and the linker modifies the address. The address modification process is called relocation. There are two types of link processing: (1) Static links, that is, the code of the function will be copied from the static Link Library to the executable program (the static Link Library is actually a set of target files, each file contains the code of one or more related functions in the database), but this increases the size of the executable program. (2) Dynamic Link means that the code to be linked is put into a shared object. The linked program only records the code information that can be used by the executable program in the future. When the executable file is executed, all the content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program will find the corresponding function code based on the information recorded in the executable program. Saves memory than static connections. After the above five processes, the C source program is eventually converted into an executable file.