The basic process of computer program Compilation
Compile a program is to translate the source program of an advanced language into an equivalent Target Program (assembly language or machine language)
1: lexical analysis
The task of lexical analysis is to convert a source program string into a sequence of word numbers, such as identifiers, constants, keywords, operators, delimiters, and so on.
Lexical analysis uses finite automatic machines.
The language accepted by finite state automation is a formal language. Regular representation is an expression.
2: syntax analysis
The task of syntax analysis is to convert the sequence of word symbols into various types of syntax units, such as "expressions", "statements", and "programs", based on the lexical analysis rules. A syntax rule is a rule composed of various types of syntax units. It analyzes and determines whether the input string constitutes a program with correct syntax. if the source program has no errors, construct its syntax tree. Otherwise, it indicates a syntax error and provides diagnostic information.
Syntax analysis uses the push-down machine processing.
In addition to finite state automation, push-down automation also includes a stack with unlimited length. Push-down automation extends the finite state automation to allow access to a stack.
Push-down automation definition:
The downstream automatic machine m is the following seven tuples (Q, Σ, gamma, Delta, q0, z0, F), where:
* Q is a collection of poor States;
* Σ is an alphabet called the input alphabet.
* Gamma is an alphabet called the stack alphabet.
* Q0 belongs to Q, which is the initial state.
* Z0 is a special stack symbol, which is called the starting symbol of the stack.
* F contains Q, which is a set of termination states.
* Delta: Q x (Σ ∪ {ε}) x gamma-> q x gamma * is an action function of M.
3: semantic analysis stage
It mainly checks whether the program contains Semantic Errors and collects type information for later use in the code generation phase. It mainly checks Type Analysis. For example, only integer data can be divided.
4: intermediate code generation
Although a compilation program can directly translate a source program into a target program, many compilation systems are designed with excessive code independence and machine. it facilitates compilation system establishment and transplantation and optimization. common Code: syntax tree, suffix, and three-address code.
Suffix is the extension expression of the expression. It is also called the inverse polish expression.
The three-address format is quad-element.
5: code optimization
Intermediate code is often a waste of time and space. when you need to generate efficient target code, you need to optimize it. optimization is usually performed in the intermediate code to optimize the control flow and data flow of the program.
6. Target code generation
Turn intermediate code into machine instruction code or assembly instruction code.
7. symbol table management
The role of a symbol table is to record necessary symbolic information in the target program to assist in semantic correctness check and code generation. during compilation, You need to quickly search, insert, and delete the symbol table, modification Operation. it can be stored in the syntax analysis and semantic analysis stages at the beginning of the lexical analysis stage.
Continues until the target code is generated.
8. Error Handling
Error Type:
Static errors: program errors found during the compilation stage, including syntax errors and static semantic errors, such as spelling errors, punctuation errors, and the absence of operators in expressions.
Dynamic Error: it also refers to a dynamic semantic error that occurs in the running stage, such as an array subscript out of bounds.
When the compiler discovers errors, it takes appropriate policies to fix them, and the analysis errors continue, so as to identify more errors.