The second form of lexical analysis is of course the automatic generator of the lexical analyzer. Like Lex, Jlex and so on. So how do you get the generator to know what tokens we want to generate?
This involves the question of a unified declarative specification, in other words, you will be able to identify the declarative specification to the generator in the form of a generator. So we just need to complete the declarative specification to complete the lexical analysis part, to the final lexical parser automatic generator will be able to generate corresponding automata for us ...
So how do you give the generator a unified declarative specification? One of the first mathematical tools to understand is the regular expression, which is the basic concept of regular expressions ...
Then we can try to use regular expressions to represent identifiers (IDs) in the C language: [a-za-z_][a-za-z0-9_]* ...
With these specifications, we can look at the generated automata (in fact, a piece of code) exactly what it looks like ... Theoretically speaking, automata is actually a mathematical concept ...
M= (ε,s,q0,f,δ)
- E: Alphabet (accepted letters)
- S: State set (how many states exist altogether)
- Q0: The initial state
- F: End-of-state set (how many acceptable states are there altogether)
- Δ: Transfer function (each state accepts what character jumps what state)
This is an example of a simple automaton ...
A string that can be accepted must reach the accepted state at the end of the string (two concentric circles, only state 2 is accepted in the figure)
The figure described in fact is DFA, state machine inside there is a kind called NFA
Reading dragon Book compiling principle Lexical Analysis (2) ...