Java Development Compiler: defect and improvement of LR state machine

Source: Internet
Author: User
Tags closure

Read blog friends can go to my netease cloud classroom, through the way the video View code debugging and execution process:

http://study.163.com/course/courseMain.htm?courseId=1002830012

The first two sections of the state machine we construct some defects, when we enter a state node, according to the characteristics of the node, we need to produce some action, according to the last two sections of the finite state machine diagram, when we enter node 5, we found that the symbol "." To the far right of the expression, in the. There are no more non-terminator or Terminator in the back, and when we enter such a node, we do a reduce operation based on the expression, such as in Node 5, where the expression and the point are as follows:

T-F.

At this point, we have to do a reduce operation according to deduction, the F pops up the parsing stack, the T is pressed into the parsing stack, the reduce operation, the state machine state from node 5 back to node 0.

Similarly, if the state machine enters node 3, due to. At the far right of the derivation expression, in. There is no other non-Terminator or Terminator in the back, so we do the reduce operation according to the derivation expression of the node, according to the expression of node 3:

NUM, F.

We need to eject the NUM stack, press the non-Terminator F into the stack, and bounce back from node 3 to node 0.

The problem is that not every node has a clear and unambiguous way of specifying what to do with our analysis process, and an obvious example is Node 1, which contains two expressions, one expression with the symbol point at the very end:

E-T.

Another expression, symbol. In the middle of an expression:

T-T. * F

So, like this, should we do reduce or shift? This situation we call shift/reduce contradiction.

There is also node 11, which contains two expressions and symbols. are at the end of these two expressions:

E-e + t.
T-T * F.

When we are in that node, exactly which expression do we do with the reduce operation? This situation we call the reduce contradiction.

In this section, we will look at the algorithms that deal with this contradiction.

SLR (1) Syntax:

There is a simple way to deal with shift/reduce contradictions. Remember the follow set algorithm we studied in the LL (1) syntax. Follow set refers to any terminator collection that can be followed by a non-Terminator, based on a syntax derivation expression, that follows the non-Terminator, which we call the Terminator set of the non-follow.

For two expressions for node 4:

E-T.
T-T. ) E

After entering the node, do you want to make a reduce based on the first expression, or do a shift operation based on the second expression? If the current input character belongs to the follow set of E, then we do the reduce operation.
In the previous section example, each non-terminator follow set is as follows, given the syntax:

Follow (s) = {EOI}
Follow (E) = {EOI,},+}
Follow (t) = {EOI,}, +, *}
Follow (f) = {EOI,}, +, *}

For node 4, if the current input character belongs to the follow set of non-Terminator E, then it is based on an expression:

E-T.

Do a reduce operation, or do a shift operation based on another expression.

Other similar nodes can be processed according to the same principle, if the grammar constructs the state machine, the occurrence of reduce/shift contradictory nodes can be processed according to the above principles, then this syntax, we call the SLR (1) syntax, SLR means simple LR.

LR (1) syntax

Some grammar-building state machines, the shift/reduce contradictions of some nodes may not be processed according to the principles given above. The currently entered character can appear in the follow set that is not terminator to the left of the first derivation expression, and the second expression. The Terminator on the right, such as node 4 above, if * belongs to the follow set of E, then the shift/reduce contradiction is difficult to solve.

In fact, we do not need to consider whether the current input character is a non-terminator follow set, as long as the current input character, when we do the reduce operation, in the state machine in the current context, whether it can be legally followed by the non-terminator after reduce.

This can be legitimately followed by a set of symbols behind a non-terminator, which we call the look ahead set, which is a subset of the follow set.

With a concrete example, let's see what the look ahead set is. For State node 11, it is a shift/reduce contradictory node. The path from node 0 to node 11 has two:

0-2, 9-11
0, 4, 8, 9, 11

To collect the look ahead collection for E, we just need to collect the two paths, followed by the Terminator on the back of E. When we do the reduce operation, the corresponding non-terminator will be pressed into the stack, once the non-terminator is pressed into the parsing stack, the terminator behind the non-Terminator is best to correspond to the next character that will be entered. We look at node 0, if the parsing process, produces a generate e of reduce back to node 0, for example:
(0, num), 3– (reduce by F, Num.), 0
(0, F), 5– (reduce by T, F.), 0
(0, T)----0 (reduce by e-T.)

At this point, the node 0 and the current input e jump to node 2.

Based on an expression:

E-E. + t

Because. followed by +, we want to enter Node 2, the currently entered character is +, or according to another expression:

S-E.

We can also expect that the next input character is EOI, so our
The Look Ahead collection contains EOI, +.

In other words, if the state machine is based on the path:
0-2, 9-11

After entering the status node 11 and entering node 11, the current input is EOI, + then we can do the reduce operation and go back to node 0.

Then look at the expression in node 4:

F-(. e)

Followed by the Terminator after E is), which indicates that if you enter node 11, and the input is a character, we can do the reduce operation, and according to the path: 0, 4, 8, 9 and 11 is returned to Node 4, and E will also be pressed into the parsing stack, according to (4, e) can enter node 8, and then according to (8,)) can enter node 10.

This e's look ahead collection contains EOI, +,), at node 11, determines whether to do reduce, just see if the next input character is an e's looking ahead collection.

In this way, each expression in each node needs to be equipped with a corresponding look ahead collection, and if the two expressions are the same, but their look ahead sets are different, then we think the two expressions are also different. Suppose an expression combines its corresponding look ahead as follows:

S->α.xβ, C
X. R

C is the look ahead collection, β, R is a combination of 0 or more terminator or non-terminator, X is not terminator, then X. The look Ahead collection of R is as follows:

First (ΒC), which is first (beta) and upper C.

The process of creating a collection expression with a look ahead is performed primarily during the closure phase. For example, for an expression in node 0:

S. E

Let's first initialize him with a look ahead set to {EOI}. So there are:

S->α.xβ, C
S. E, {EOI}
Under:
X. R First (ΒC)

We have:
E-e + t First (βc)
E. T First (ΒC)
Because beta is empty, first (βc) = C = {EOI}, then there are:

E->. E + t {EOI}
E->. T {EOI}

After we get the expression above, we continue to push down:
S->α.xβ, C
E->. E + t {EOI}

Under:
X-R First (ΒC)
E-e + t
Because β= + t, so
First (Beta) = +
First (Βc) = {+, EOI}

So there's a new expression:
E-E + t {+, EOI}
This process is repeated until no new expression can be generated.

According to Node 0, we will go through the implementation of this algorithm, in order to achieve the above algorithm, we need two data structures, one is the stack productionstack, one is the collection closureset.

First, we initialize the expression for node 0 and give the expression an initial look ahead set {EOI}:

[s--. E {EOI}]

Press the above expression onto the stack and the collection:
Productionstack:

[s--. E {EOI}]

Closureset:

[s--. E {EOI}]

If the stack is not empty, the expression at the top of the stack is stacked and a new expression is constructed using the algorithm described above, at which point the expression at the top of the stack is:

[s--. E {EOI}]

Under:
S->α.xβ, C
S->. E, {EOI}
X. R First (beta, C)

We can generate two new expressions:
[E. E + t {EOI}]
[E. t {EOI}]

Press them into the stack and join the collection separately:
Productionstack:
[E. t {EOI}]
[E. E + t {EOI}]

Closureset:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]

To stack the expression at the top of the stack:
[E. t {EOI}]

To construct a new expression:
[T-T * F {EOI}]
[T-f {EOI}]

Add them to the stack and collection:
Productionstack:
[T-f {EOI}]
[T-T * F {EOI}]
[E. E + t {EOI}]

Closureset:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]
[T-T * F {EOI}]
[T-f {EOI}]

To stack the top expression:
[T-f {EOI}]
Similarly constructs a new expression:
[f->. (e) {EOI}]
[F. NUM {EOI}]

Press the newly generated expression into the stack and join the collection:
Productionstack:
[F. NUM {EOI}]
[f->. (e) {EOR}]
[T-T * F {EOI}]
[E. E + t {EOI}]

Closureset:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]
[T-T * F {EOI}]
[T-f {EOI}]
[f->. (e) {EOR}]
[F. NUM {EOI}]

Stack the expression at the top of the stack:
[F. NUM {EOI}]
Because. The following is not a non-terminator, so the expression is not processed. Continue to pop up the top of the stack expression:

[f->. (e) {EOI}]

Similarly, because the. Back is not non-terminator, the expression is not processed. Continue to pop up the top of the stack expression:

[T-T * F {EOI}]

At this point, the portion of β corresponds to * f, so first (βc) = First (* f) and {EOI}, which is {* EOI}, so we generate a new expression:

[T-T * F, {* EOI}]
[T-F {* EOI}]

Press the above expression onto the stack and join the collection:

Productionstack:
[T-F {* EOI}]
[T-T * F, {* EOI}]
[E. E + t {EOI}]

Closureset:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]
[T-T * F {EOI}]
[T-f {EOI}]
[f->. (e) {EOR}]
[F. NUM {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]

You see, the expression
1.[t. T * F, {* EOI}] and
2.[t. T * F {EOI}]
The only difference is that the look ahead collection is different, the first expression of the look ahead collection is larger than the second expression, we say that the expression 1 covers the expression 2, with the expression 1, the expression 2 is unnecessary, so we remove the expression 2 from the collection, the same expression
[T-f {EOI}]
Also to be deleted, so
Closureset is:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]
[f->. (e) {EOR}]
[F. NUM {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]

Next, continue to pop up the stack top expression:
[T-F {* EOI}]

To generate a new expression:

[F. (e) {* EOI}]
[F. NUM {* EOI}]

Pressing them into the stack is:
Productionstack:
[F. NUM {* EOI}]
[F. (e) {* EOI}]
[T-T * F, {* EOI}]
[E. E + t {EOI}]

Because the generated expression overrides the expression in Closureset:

[f->. (e) {EOR}]
[F. NUM {EOI}]

Remove the above two expressions from the collection and add the newly generated expression to the collection:

Closureset:
[s--. E {EOI}]
[E. E + t {EOI}]
[E. t {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]
[F. (e) {* EOI}]
[F. NUM {* EOI}]

At this point the stack top two expressions are:
[F. NUM {* EOI}]
[F. (e) {* EOI}]

In both of these expressions, the right side is not non-terminator, so direct them out of the stack. Continue to pop up the top of the stack expression:

[T-T * F, {* EOI}]

Because β corresponds to * f, first (β) = {}, the resulting look ahead collection is still { EOI}, and a new expression is generated:

[T-T * F {* EOI}]
[T-F {* EOI}]

These two expressions are already present in the collection, so they are no longer added to the collection.

Continue to pop up the top of the stack expression:
[E. E + t {EOI}]

β corresponds to + t, first (β) = first (+ t) = {+}, so the corresponding look ahead collection when generating a new expression is {+ EOI}, so there are:

[E. E + t {+ EOI}]
[e->. t {+ EOI}]

Press them into the stack, respectively:
Productionstack:
[e->. t {+ EOI}]
[E. E + t {+ EOI}]

Because these two expressions override the expressions in the collection:

[E. t {EOI}]
[E. E + t {EOI}]

So the above two expressions are removed from the collection and the new expression is added to the collection:

Closureset:
[s--. E {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]
[F. (e) {* EOI}]
[F. NUM {* EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
To stack the top expression:
[e->. t {+ EOI}]

To generate a new expression:

[T-T * F {+ EOI}]
[T-f {+ EOI}]

Press the newly generated expression into the stack and join the collection:
Productionstack:
[T-f {+ EOI}]
[T-T * F {+ EOI}]
[E. E + t {+ EOI}]

Closureset:
[s--. E {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]
[F. (e) {* EOI}]
[F. NUM {* EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
[T-T * F {+ EOI}]
[T-f {+ EOI}]

Pop-up stack top expression:
[T-f {+ EOI}]

To generate a new expression:

[F. (e) {+ EOI}]
[F. NUM {+ EOI}]
Press the new expression into the stack and join the collection:
Productionstack:
[F. NUM {+ EOI}]
[F. (e) {+ EOI}]
[T-T * F {+ EOI}]
[E. E + t {+ EOI}]

Closureset:
[s--. E {EOI}]
[T-T * F, {* EOI}]
[T-F {* EOI}]
[F. (e) {* EOI}]
[F. NUM {* EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
[T-T * F {+ EOI}]
[T-f {+ EOI}]
[F. (e) {+ EOI}]
[F. NUM {+ EOI}]

Because of the two expressions at the top of the stack. The right side is not non-terminator, so direct them out of the stack.

Continue to pop up the top of the stack expression:

[T-T * F {+ EOI}]

At this point β corresponds to * f, first (β) = First (* f) = {}, and the newly generated expression corresponding to the look ahead set is { + EOI}:

[T-T * F {* + EOI}]
[T-F {* + EOI}]

Pressing an expression into the stack
Productionstack:
[T-F {* + EOI}]
[T-T * F {* + EOI}]
[E. E + t {+ EOI}]

Notice that the newly generated expression:
[T-T * F {* + EOI}]
It overrides the original expression in the collection:
[T-T * F {* EOI}]
[T-T * F {+ EOI}]

[T-F {* + EOI}]
Covered by:
[t->. f {* EOI}]
[T-f {+ EOI}]

Delete the overridden expression, add the new expression
Closureset:
[s--. E {EOI}]
[F. (e) {* EOI}]
[F. NUM {* EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
[F. (e) {+ EOI}]
[F. NUM {+ EOI}]
[T-T * F {* + EOI}]
[T-F {* + EOI}]

Pop-up stack top expression:
[T-F {* + EOI}]

To construct a new expression:

[F. (e) {* + EOI}]
[F. NUM {* + EOI}]

Press them onto the stack:
Productionstack:
[F. NUM {* + EOI}]
[F. (e) {* + EOI}]
[T-T * F {* + EOI}]
[E. E + t {+ EOI}]

Because an expression
[F. (e) {* + EOI}]
To override an expression in a collection:
[F. (e) {* EOI}]
[F. (e) {+ EOI}]
And
[F. NUM {* + EOI}]
Overwrite an existing expression in the collection:
[F. NUM {* EOI}]
[F. NUM {+ EOI}]

Delete The overridden expression, after adding the new expression
Closureset:
[s--. E {EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
[T-T * F {* + EOI}]
[T-f {+ EOI}]
[F. (e) {* + EOI}]
[F. NUM {* + EOI}]

At this point the stack top two expressions,. The symbol on the right is not a non-terminator, and the stack top two expressions are stacked.

Then continue to pop up the stack top expression:
[T-T * F {* + EOI}]

To build an expression:
[T-T * F {* + EOI}]
[T-F {* + EOI}]
Two expressions already exist in the collection and are not processed.

Continue to pop up the top of the stack expression:

[E. E + t {+ EOI}]

It generates two expressions

[E. E + t {+ EOI}]
[E. t {+ EOI}]

Because expressions exist in the collection, they are not processed. To this stack is empty, the entire closure process ends, at which point the entire closure collection is:

[s--. E {EOI}]
[E. E + t {+ EOI}]
[e->. t {+ EOI}]
[F. (e) {+ EOI}]
[T-T * F {* + EOI}]
[T-F {* + EOI}]
[F. (e) {* + EOI}]
[F. NUM {* + EOI}]

In addition to closure this step change, the other steps generated by the node remain unchanged.

Java Development Compiler: defect and improvement of LR state machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.