Excerpt from http://www.cnblogs.com/hush/archive/2004/09/06/40361.html
Yesterday and Sumtec talked about automata and grammar analysis, suddenly brain a little confused, put some concepts confused, read a half-day Tsinghua's compiled book also did not understand ... I got up this morning and watched the automata part of discrete mathematics and its applications. It's a foreigner's book that speaks clearly.
Yesterday is mainly the NFA and grammar Analysis of LL (1) LR (1) confused. In fact, LL (1) and LR (1) analysis, using a lower-push automata based on the calculation model, rather than finite automata. The calculation ability of the push automata is stronger than that of the finite automata.
The second is that the computational power of the NFA and the DFA is indeed equivalent, that is, an equivalent DFA can be found for any NFA (the subset method can be used to construct such a DFA).
In order to illustrate the relationship between finite automata and LL (1) LR (1), the classification of grammars is first outlined.
The grammar is divided into four categories:
(1) Short language (Type 0 grammar)
(2) Context-related grammar (Type 1 grammar)
(3) Context-free grammar (2-type grammar)
(4) formal (then) grammar (Type 3 grammar)
The above four grammars contain relationships, the 1-type grammar is a subset of the 0-type grammar, the 2-type grammar is a subset of the 1-type grammar, and the 3-type grammar is a subset of the 2-type grammar.
We mainly study the type 2 and 3 methods.
3-type grammars (regular grammars) are equivalent to regular expressions (Regular Expression), and any regular grammar can always be converted into an equivalent regular expression. Meanwhile, regular expressions are equivalent to finite automata. A language that can be identified by a finite automaton must be represented by regular expressions, and a language expressed in regular expressions can be identified by a finite automaton.
But regular grammars are not enough to describe programming languages (for example, you can't define mathematical expressions with parentheses in regular expressions), and now popular programming languages such as C # and Java are defined by a 2-type grammar, which is a context-free grammar. Therefore, the finite automaton has no ability to identify the programming language (I will give an example in the end). Therefore, the model of the push-back automata is proposed. Push down automatic machine all parts of the finite automata, such as state, State transfer table and so on, at the same time it is more than a finite automaton stack, often known as the calculation stack. The down-push automata can press the Terminator or non-terminal into the stack as appropriate, or pop up the stack.
and ll (1), LR (1) and other analytical methods are used to analyze context-free grammar, based on the model of the push automata. And that's why when you introduce grammar analysis, all the books will say a predictive analyzer based on LR (1) analysis is composed of three parts: State transition table, controller and calculation stack. and the so-called migration and reduction is the problem of stack and out stack.
Finally, give an example (as if everyone was asleep-_-b)
First, give a grammar:
s->0s1 | 01
where 0, 1 are terminators.
Such a grammar description of the language is actually N 0 plus an equal number of n 1, where n is an integer.
This grammar is a context-free grammar, but not a regular grammar. So we can't write a regular expression to describe a language like that. is equal to an NFA that recognizes any sentence in this grammar, we can always find such a sentence which is not defined by the grammar but accepted by this NFA. In other words, any NFA cannot be used to determine whether a sentence is defined by the above grammar. To be practical, if you want to write a shader program, the input file is a series of 0, 1 of the sequence, which requires the n 0 plus n 1 of the sequence in red coloring, and other in black, we can not use regular expression matching to complete this task.
If the above example is more abstract, then there is no way to match a regular expression for a language such as a "mathematical expression with parentheses". Because the grammar that describes the "mathematical expression with parentheses" is not a regular grammar, because it has a similar:
F-> (E)
Such a part.
The regular grammar requires that all production must be in the form of A->ab or A->a (where a, B is non-terminal, A is non-terminal)