4.7.4 Constructing LALR parsing Tables
We now introduce our last parser construction method, the LALR (LOOKAHEAD-LR) technique. This method was often used in practice, because the tables obtained by it was considerably smaller than the canonical LR Ta Bles, yet most common syntactic constructs of programming languages can being expressed conveniently by an LALR grammar. The same is almost true to SLR grammars, but there was a few constructs that cannot being conveniently handled by SLR Techni Ques (see Example 4.48, for Example).
For a comparison of parser size, the SLR and LALR tables for a grammar always has the same number of States, and this num BER is typically several hundred states for a language like C. The canonical LR table would typically has several thousand states for the same-size language. Thus, it's much easier and more economical to construct SLR and LALR tables than the canonical LR tables.
By-Introduction, let us again consider grammar (4.55), whose sets of LR (1) items were shown in Fig. 4.41. Take a pair of similar looking states, such as I4 and I7. Each of these states have only items with first component c→[email protected] . In I4, the lookaheads is C or D; In I7, $ are the only lookahead.
To see the difference between the roles of I4 and I7 in the parser, note that the grammar generates the regular language C *dc*d. When reading an input cc CDCC CD, the parser shifts the first group of C ' s and their following d onto the stack, entering state 4 after reading the D. T He parser then calls for a reduction by c→d, and provided the next input symbol is C or D. The requirement, C or D follow makes sense, since these is the symbols that could begin strings in c*d. If $ follows the first d, we have a input like CCD, which are not in the language, and state 4 correctly declares An error if $ is the next input.
The parser enters state 7 after reading the second D. Then, the parser must see $ in the input, or it started with a string not of the form C *dc*d. It thus makes sense this state 7 should reduce by c→d on input $ and declare error on inputs C or d< /c6>.
Let us now replace I4 and I7 by I47, the Union of I4 and I7, consisting of the set of three items represented by [C→[em Ail protected], c/d/$]. The Goto ' s on D to I4 or I7 from I0, I2, I3, and I6 now enter I47. The action of State was to reduce on any input. The revised parser behaves essentially like the original, although it might reduce D-to-C in circumstances where the Origi NAL would declare error, for example, on input like CCD or CDCDC. The error would eventually be caught; In fact, it'll be caught before any more input symbols is shifted.
More generally, we can look for sets of LR (1) items have the same core, that's, set of first components, and we may mer GE these sets with common cores into one set of items. For example, in Fig. 4.41, I 4 and I 7 form such a pair, with core {C→[email protected]}. Similarly, I3 and I6 form another pair, with core {C→[email protected], c→ @cC, c→@d}. There is one more pair, I8 and I9, with Common Core {C→[email protected]}. Note that, in general, a-core is a set of LR (0) items for the grammar at hand, and so an LR (1) grammar could produce more than sets of items with the same core.
Since the core of Goto (I, X) depends only on the core of I, the GOTO ' s of merged sets can themselves is merged. Thus, there is no problem revising the Goto function as we merge sets of items. The action functions is modified to reflect the non-error actions of all sets of the items in the merger.
Suppose we has an LR (1) grammar, which is, one whose sets of LR (1) Items produce no parsing-action conflicts. If We replace all States has the same core with their union, it's possible that the resulting union would have a Confli CT, but it was unlikely for the following reason:suppose in the union there was a conflict on lookahead a because there was An item [a→α@, a] calling for a reduction by a→α, and there are another item [B→β@aγ, B] calling for a shift. Then some set of items from which the union is formed have item [a→α@, A], and since the cores of all these states is the Same, it must has an item [B→β@aγ, c] for some c. But then this state had the same shift/reduce conflict on a, and the grammar is not LR (1) as we assumed. Thus, the merging of States with common cores can never produce a shift/reduce conflict that is not present in one of the Original states, because shift actions depend only on the core, not the lookahead.
It is possible, however, that a merger would produce a reduce/reduce conflict, as the following example shows.
Example 4.58: Consider the grammar
S ' →s
S→a a D | b b D | A B e | b A E
A→c
B→c
Which generates the four Strings ACD, Ace, BCD, and BCE. The reader can check the The grammar is LR (1) by constructing the sets of items. Up on doing so, we find the set of items {[A→[email protected], d]; [B→[email protected], E]} Valid for viable prefix AC and {[A→[email protected], E]; [B→[email protected], d]} Valid for BC. Neither of these sets has a conflict, and their cores is the same. However, their union, which is
A→[email protected], d/e
B→[email protected], d/e
Generates a reduce/reduce conflict, since reductions by both A→c and B→c is called for on inputs D and e.-
We is now prepared to give the first of the LALR table-construction algorithms. The general idea was to construct the sets of LR (1) items, and if no conflicts arise, merge sets with common cores. We then construct the parsing table from the collection of merged sets of items. The method we is about to describe serves primarily as a definition of LALR (1) grammars. Constructing the entire collection of LR (1) Sets of items requires too much space and time to being useful in practice.
algorithm 4.59: An easy, but space-consuming LALR table construction.
Input:an augmented Grammar G '.
Output:the LALR parsing-table functions ACTION and GOTO for G '.
METHOD:
1. Construct C = {I0, I1, ..., in}, the collection of sets of LR (1) items.
2. For each core present among the set of LR (1) items, find all sets have that core, and replace these sets by their uni On.
3. Let C ' = {J0, J1, ..., Jm} be the resulting sets of LR (1) items. The parsing actions for state I is constructed from Ji in the same manner as in algorithm 4.56. If There is a parsing action conflict, the algorithm fails to produce a parser, and the grammar are said not to be LALR (1).
4. The GOTO table is constructed as follows. If J is the union of one or more sets of LR (1) items, which is, J = I1∪i2∪ ... ∪ik, then the cores of Goto (I1, X), goto (I2, x), ..., Goto (IK, X) is the same, since I1, I2, ..., Ik all has the same core. Let K is the union of all sets of items have the same core as GOTO (I1, X). Then GOTO (J, X) = K.
-
The table produced by algorithm 4.59 are called the LALR Parsing table for G. If There was no parsing action conflicts, then the given grammar was said to being an LALR (1) grammar. The collection of sets of items constructed in step (3) are called the LALR (1) Col lection.
Example 4.60: Again consider grammar (4.55) whose GOTO graph was shown in Fig. 4.41. As we mentioned, there is three pairs of sets of items that can is merged. I3 and I6 is replaced by their union:
I36: |
C→[email protected], c/d/$ c→ @cC, c/d/$ C→@d, c/d/$ |
I4 and I7 is replaced by their union:
I47: |
C→[email protected], c/d/$ |
and I8 and I9 is replaced by their union:
I89: |
C→[email protected], c/d/$ |
The LALR action and GOTO functions for the condensed sets of items is shown in Fig. 4.43.
State |
ACTION |
Goto |
C |
D |
$ |
S |
C |
0 |
S36 |
S47 |
|
1 |
2 |
1 |
|
|
Acc |
|
|
2 |
S36 |
S47 |
|
|
5 |
36 |
S36 |
S47 |
|
|
89 |
47 |
R3 |
R3 |
R3 |
|
|
5 |
|
|
R1 |
|
|
89 |
R2 |
R2 |
R2 |
|
|
Figure 4.43:LALR Parsing table for the grammar of Example 4.54
To see how the Goto is computed, consider Goto (I36, C). In the original set of LR (1) items, GOTO (I3, c) = I8, and I8 are now part of I89, so we make GOTO (I36, c) be I89. We could has arrived at the same conclusion if we considered I6 and the other part of I36. That's, GOTO (I6, C) = I9, and I9 is today part of I89. For another example, consider GOTO (I2, c), a entry that's exercised after the shift action of I2 on input c. In the original sets of LR (1) items, GOTO (I2, c) = I6. Since I6 is today part of I36, GOTO (I2, c) becomes I36. Thus, the entry in Fig. 4.43 for state 2 and input C are made s36, meaning shift and push state, onto the stack.
-
When presented with a string from the language c*dc*d, both the LR parser of Fig. 4.42 and the LALR parser of Fig . 4.43 make exactly the same sequence of shifts and reductions, although the names of the States in the stack may differ. For instance, if the LR parser puts I3 or I6 on the stack, the LALR parser would put I36 on the stack. This relationship holds in general for an LALR grammar. The LR and LALR parsers would mimic one another on correct inputs.
When presented with erroneous input, the LALR parser could proceed to do some reductions after the LR parser have declared an Error. However, the LALR parser would never shift another symbol after the LR parser declares an error.
For example, on input CCD followed by $, the LR parser of Fig. 4.42 would put
0 3 3 4
On the stack, with the state 4 would discover an error, because $ is the next input symbol and state 4 have action error on $. In contrast, the LALR parser of fig.4.43 would make the corresponding moves, putting
0 36 36 47
On the stack. But the state is on the input $ has action reduce c→d. The LALR parser would thus change it stack to
0 36 36 89
Now the action of the State, in the input $ is reduce c→cc. The stack becomes
0 36 89
Whereupon a similar reduction is called for, obtaining stack
0 2
Finally, State 2 have action error on input $, so the error is now discovered.
4.7.4 Constructing LALR parsing Tables