Summarize some of the compiler principle lexical analysis content.
1 Brief introduction
For a compiled language, the program needs to be compiled into binary code by the compiler so that it can run on the computer. For the current program, it cannot be translated directly into a binary executable file. Intermediate links are required in the middle, which usually includes a lexical parser, a parser, and a semantic parser. Here is the main summary of the lexical analyzer.
The main function of the lexical analyzer is to divide the program into words and classify the words in related categories.
1 int x; 2 ten;
For example, the simplest of these assignment statements. After lexical analysis, it will be cut into, some of the spaces and so will be processed to filter out
1 INT ID (X) semicolon 2 ID (X) EQ INT (ten)
There are usually two methods for the implementation of the lexical Analyzer, a manual code implementation, one through automatic implementation.
Automatic implementation, that is, the need for a language to achieve the exact definition of the program-related words. A regular expression method is commonly used here.
2 Correlation algorithm
Here I simply drew up some of the algorithms and concepts involved in the lexical analyzer. It should be pointed out that the automatic implementation of the three kinds of algorithms is actually a progressive relationship. That
Re describes->thompson algorithm---subset construction algorithm->hopcroft algorithm--code
Each of these algorithms outputs the input of the next algorithm. To solve the problem of minimizing the NFA to DFA. The Thompson algorithm solves the problem of RE->NFA, and the subset construction algorithm solves the problem of NFA to DFA, and Hopcroft solves the problem of minimizing the DFA.
Lexical Analyzer Summary