First, the introduction
The compiler can process various programs written in software language into programs that can be executed on the computer, which is an important system software. In the compilation system, the parsing phase is the second stage after the whole compilation process relay lexical analysis. The task of parsing is to analyze and determine whether the grammatical structure of the program conforms to the grammatical rules on the basis of lexical analysis and identification of word symbol strings. According to the method of establishing the syntax analysis tree, the grammatical analysis is divided into two categories: Top-down analysis and bottom-up analysis. The LR analysis method discussed in this paper belongs to the bottom-up approach.
Ii. representation of grammatical rules
Grammar rules are defined by context-free grammars. For example, a language can be defined by a series of production definitions:
< program >-> begin < Statement string > End < statement serial >-> < statement ><;< statement serial >> and so on, in order to facilitate the problem description, the actual meaning of the abstract with symbols, this article discusses Examples are the abstract symbolic representations, and the LR (0) method of parsing requires that the grammar is ambiguous. The grammar of this article is as follows: E->aa|bb a->ca|d b->cb|d
Three, the basic idea of LR analysis method
The LR (0) method discussed in this paper scans the symbol string from the left to right, according to the symbol string in the current analysis stack (usually in state) and the right order to view the K (≥0) symbol of the string, according to the LR analysis table, It is uniquely determined whether the action of the parser is moved or reduced and which produces the handle to be reduced until it is classified as the initial symbol. This process is the inverse process of the simulation specification derivation.
Four, LR (0) analysis of the structure of the table
In the whole analysis process, the key part is the analysis table structure, this article mainly discusses the LR (0) Analysis table construction, namely in the analysis process does not need to see the input symbol to the right.
(a) The extension of grammar. Because the final reduction state must be reduced to the beginning symbol, the uniqueness of the reduction the creation of a start symbol must be intelligent with a candidate, so there are two or more candidates for the start symbol (for example, this example), which requires a generalization of the grammar, a production s->e, and s as the starting symbol.
(ii) Calculation of LR project set families. Since the LR analysis method is the simulation specification, the symbolic string after each reduction should be called a canonical sentence pattern, then the scanned string is a prefix of the canonical sentence pattern, and any symbol after the handle is called a live prefix. The basic idea of Cong is to construct a finite automaton that identifies all the live prefixes and then transforms them into analytic tables. The state of this finite automaton is a project formed after each generation followed by a dot. The whole of the project set of the DFA that identifies the grammar live prefix is called the LR project set canonical family. The calculation method is:
1, starting from the beginning of a > symbol, the closure closure of the production type is obtained. Set I to any set of items:
①i any project set belongs to Closure (I); ② such as A->α Bβ in Closure (i), the project b-> Γ also belongs to closure (i) for any b->γ of the production of B. ③ repeats the two steps above until closure (I) is no longer enlarged.
2. Calculate the go function for all projects in the project set:
Go (i,x) =closure (J), j={any project in the form of a->αx β | A->α Xβ belongs to i}.
According to the above rules, the project set family of this example is calculated as I0-I11, respectively:
I0={s '-> E, e-> aa, e-> BB}
I1=go (i0,e) ={s '->e}
I2=go (i0,a) ={e->a A, a-> CA, a-> D}
I3=go (i0,b) ={e->b B, B-> CB, B-> D}
I4=go (i2,c) ={a->c A, a-> CA, a-> D}
=go (I4,C)
I5=go (i3,c) ={e->c B, B-> CB, B-> D}
I6=go (i2,a) ={e->aa}
I7=go (i3,b) ={e->bb}
I8=go (i4,a) ={a->ca}
I9=go (i5,b) ={B->CB}
I10=go (i4,d) =go (i2,d) ={a->d}
I11=go (i3,d) =go (i5,d) ={b->d}
(iii) construct LR analysis table. For LR (0) grammars, the LR analysis table can be constructed directly from its project set specification family C and the state transition function of the automaton that identifies the live prefix. The LR Analyzer's analysis table can be abstracted to parse action table action and State Transition table Goto. The construction method is as follows:
1, if the project A->α Aβ belongs to Ik and go (ik,a) =ij,a for non-terminal, then put Action[k,a] to "move (j,a) close to the stack", denoted for SJ.
2, if the project A->α belong to IK, then, to any non-terminal a (or Non-terminal #), place Action[k,a] for the production of A->α (set to J-production) for the reduction, denoted for RJ.
3, if the project S '->s belongs to IK, then put action[k,#] to accept, introduction for Acc.
4, if Go (ik,a) =ij,a is a non-terminator symbol, then place goto[k,a]=j.
5, the analysis table where the above rules can not add information to the blank of the "error".
In summary, according to the above rules, the LR (0) Analysis Table of this example is
State |
Action table |
Goto table |
A |
B |
C |
D |
# |
E |
A |
B |