1. The status of grammatical analysis---is the core part of the compiler program
2. The task of parsing
--Identify whether a word sequence derived from lexical analysis is a sentence of a given grammar
3. Theoretical basis of grammatical analysis
--context-independent grammar and push-down automata
4. Ways of parsing
1) Top-down syntax analysis
* Repeated use of different production to deduce to seek to match the input symbol string
2) Bottom-up syntax analysis
* For the input symbol string look for different production protocols know grammar start symbol
Note: The above refers to the start symbol, which refers to the sentence itself.
5. Push-down automata (PDA)
Model:
Input with---> finite state controller (push stack)---> Output band (record generation number)
1) The action of PDA is determined by three factors: current state, reading head, pointing symbol, pushing stack top symbol
2) An input string can be accepted by the PDA, only when the input string is read, the lower push stack becomes empty; or when the input string is read, the controller reaches some final state
3) Formal grammar and finite automata are only suitable for the description and recognition of high-level language of all kinds of words, the statement can be used for context-independent grammar description, and the push-off automata to recognize the context-independent grammar described in the language, so the context-independent grammar and its corresponding push automata become the theoretical basis for grammatical analysis in the compilation technology.
Defined:
PDA is a seven-tuple: ε
1) All State sets
2) Enter the alphabet collection
3) Set of alphabet in the push-down stack
4) Mapping function
5) Start status
6) Stack initial symbol for push-down stack
7) End State set
When a non-terminator is present at the top of the stack, the input character is used to replace the top of the stack, and the current state is converted.
Algorithm:
1) stack top symbol x is a non-terminator, query syntax table, find an X as the left side of the production, x out of the stack, and the right side of the stack, and the output with a note the production number---deduction
2) If the stack top symbol x is terminator, and the symbol under the read head is also x, then X is out of the stack, and the reading head points to the next symbol-match
3) If the stack top symbol x is terminator, but the sign below the read head is not X, then the match fails. Back to the last deduction field (including the top symbol, the pointer of the reading head and the information on the output band)--backtracking
4) After backtracking select another candidate for deduction, if there is no candidate optional, then further backtracking. If there is no candidate option to go back to the start symbol, the recognition fails
5) If the stack top is empty and the reading head is empty, the recognition succeeds
Problem: Grammar can not be left recursive, no heuristic candidate selection, can not indicate the exact location of the error
Eliminate left recursion:
1, eliminate direct left recursion
Original grammar: E--E A1 | E A2 | ... | E an | B1 | B2 | ... | BN
After elimination: E--B1 e ' | b2 e ' | ... | | bn e '
&nb sp; E '--A1 e ' | A2 E ' | ... | An E ' | ε
2, eliminate indirect left recursion
a) sort all non-terminating symbols by a sequence of E1, E2, ... En;
B) for I=1 to n does/* processes each non-terminating symbol in turn */
for J=1 to i-1 do/* processing 1th to I-1 */
+ nbsp; ei-to Ej R
change to Ei--S1 R | S2 R | ... | Sk R
where EJ-to-S1 | S2 | ... | Sk
C) eliminates direct left recursion to EI.
Note: The order of non-Terminator is different, the result may be different.
3. Remove useless symbols and useless production
1) p->px|b = = = P->bp ', p '->xp ' |ε (replaces only the first to produce a left recursive P)
Example: Grammar g:e->e+t| T, t->t*f| F, f-> (E) |i
E->te ', E '->+te ' |ε
T->ft ', T '->*ft ' |ε
F-> (E) |i
Forecast:
1. End of candidate descriptor first (P)-with descriptor
2. Pre-read symbols. If the first character set 22 of the candidate in the PDA does not intersect, then according to the read-ahead symbol can be accurately assigned the production type
3. Extract the public left factor. By repeatedly extracting the first set of characters, the 22 does not intersect, but introduces a large number of production and ε factors
LL (1) Grammar: Predictive Analysis Table (2-dimensional array, row: Non-terminator, column: Terminator), which candidate to use for substitution. Must be a non-semantic grammar, just a subset of a context-independent grammar
Judging is not LL1 grammar:
1) Eliminate left recursion
2) extraction of public left factor
3) Seeking first and follow sets
4) A->x|b, first (x) turn first (b) = Empty set
Recursive descent analysis method
1) first turn into LL (1) grammar
2) Write a recursive function for each non-terminator
Disadvantage: The grammar requirement is high, must satisfy LL (1) grammar; High-depth recursion can affect the efficiency of parsing, slow, and occupy more space
Because ll grammar requirements are higher, so compile general use protocol (bottom-up analysis method)