For the basic concepts of compilation principle, refer to the http://www.cnblogs.com/bitzhuwei/archive/2012/10/22/SmileWei_Compiler.html
For the basic data structure for the following code, see http://www.cnblogs.com/bitzhuwei/archive/2012/03/09/compiler_basic_data_structure.html
1. Eliminate direct left recursion
Set P-> p α 1 | p α 2 |... | p α n | β1 | β2 |... | β m
Each α is not ε (ε is null, and nothing means anything, similar to null), and each β does not start with P.
The non-terminator P can be rewritten
P-> β 1p' | β 2p' |... | β mp'
P'-> α 1p' | α 2p' |... | α NP'
Explanation: the original p expansion is β X α I .. α I α J .. α j... α T .. the form of Alpha T, that is, a string followed by various Alpha, starting with a certain beta. So it is the same as what is expressed in the form of rewriting.
Ii. Eliminate indirect left recursion
Given grammar G, if G does not contain a loop (P is derived after several steps and P is obtained) and does not contain the formula with ε as the right.
The algorithm for removing left recursion is as follows:
- Non-terminologies of G are arranged in any order, such as A1, A2, A3,...,
- For (I = 1; I <= N; I ++)
For (j = 1; j <= I-1; j ++)
{
Rewrite the production formula like Ai-> AJ gamma to Ai-> delta 1 gamma | Delta 2 gamma |... | Delta K gamma format, where AJ-> delta 1 | Delta 2 |... | Delta K is all the rules about AJ
Eliminate direct left recursion in AI rules
}
- Simplify the syntax obtained from the previous step, that is, remove unnecessary rules.
Iii. First Set
If the syntax G is a binary syntax and does not contain left recursion, the first (α) of the terminator set of each candidate α of the non-terminator G is first (α) = {A | α is deduced as a after 0 or multiple steps... in the form of a, vt}
Interpretation: the meaning of the first set is: after the candidate formula is deduced, the final string is a Terminator. The derivation process is different and there will be multiple different strings (which may be infinite ), A set of the first characters in these strings is the first set of the candidate formula. With this first set, you can know whether the candidate formula can match the word stream to be parsed.
Iv. Follow set
Set the context-independent grammar (two-type grammar) g to S. For any non-terminator A in G, its follow () = {A | s after 0 or multi-step derivation will appear... aa... in the format of a, VT or}
Interpretation: The meaning of the follow set is: In all the sentence patterns of G, it can follow all terminologies or pound signs after non-terminator #. # Is the last character in the case of....
5. Construct the follow set algorithm
- Order # ε follow (s)
- If the grammar G contains rules such as a-> α B Beta, and β = ε, all non-terminologies in first (Beta) are added to follow (B)
- If the grammar G contains rules such as a-> α B or a-> α B Beta, and ε first (β), The follow () add all elements in follow (B)
- Use the first two rules repeatedly until all follow sets remain unchanged.
6. Construct the LL (1) analysis Table Algorithm
Input: grammar g
Output: LL (1) analysis table M (ax, ay) of G, where A is not the Terminator and A is the Terminator.
Algorithm:
- Obtain the first and follow sets of G.
- For (each generation of G, A-> gamma 1 | gamma 2 |... | gamma m)
{
If (a ε first (gamma I) is set to M (A, A) as "a-> gamma I"
If (ε first (gamma I ))
For (each A, follow ())
Set M (A, A) to "a-> gamma I" (actually, all gamma I here are ε)
}
Set all undefined M (A, A) as an error.