Introduction
At the beginning of the compiler's work, it is to read the source code (pre-compilation is not considered), and then remove some null characters, and then match the elements with their attributes (optional ), form a lexical unit. Multiple lexical units are connected to a lexical unit sequence. Since then, this method of analyzer has been completed. It's that simple. It can be seen that finding a word is the core work of the lexical analyzer. How can we obtain a word? Is the main content to be discussed in this section.
3.1 Workflow
The workflow of the lexical analyzer has been briefly described above. The following figure shows a more intuitive process:
Take a simple C language program block as an example to show the process of the lexical analyzer:
3.2 Internal Mechanism
In order to get a canonicalized word that complies with the language description, it is not as simple as imagined. "I just want to get a string !", Well, we are wrong. This is an arduous task, marked:
The following figure shows the internal mechanism of this method Analyzer:
1. First, determine a word, that is, read a string and a pattern match,
2. The mode is described by DFA. (DFA: deterministic finite automaton)
3 DFA can also be marked as a status transition diagram
4. The status chart can be generated using a regular expression.
5. Regular Expressions are generated by syntax conversion.
6 grammar production is defined by the programming language itself.
It should be said that the compilation principle is complicated. To know DFA, we need to learn discrete mathematics. To know the status chart, we need to know the 'fig', that is, we need to learn the data structure; it also involves regular expressions, programming languages, and so on. Therefore, it is a little difficult to learn the compilation principles. When encountering a concept, you must learn the knowledge points corresponding to this concept. Besides, this is just a lexical analyzer, followed by knowledge about syntax analyzer, code optimization algorithms, and computer architecture. But don't be discouraged. Just calm down and take it slowly. After learning it, we may not be able to write a compiler on our own. However, I think, for coding optimization, icdesign will certainly be helpful.
Conclusion 3.3
The core of the lexical analyzer is to find the operators that match the pattern. To know the mode description, we need to describe the specific programming language in general, first convert the question syntax, then the grammar produces the formula, then the regular expression, then the state transition graph, and then DFA.
After finding the word element, you also need to create a symbol table and form a lexical unit with the attribute of the word element. Then, the sequence is sent to the syntax analyzer to generate an abstract syntax tree, this is what we will talk about in the next section.
Author: rill_zhen