Configurable parser Development Chronicle (iv) construct a truly available state machine (top)

Source: Internet
Author: User
Tags dashed line mul regular expression expression engine

Originally said this article to the construction of deterministic state machine and look ahead finished, when I really want to write when found too many things, had to be divided into two pieces. The previous article said how a basic state machine was constructed, but according to the first article, the grammar was designed to directly construct the syntax tree service, so it is necessary to get all the information about constructing the syntax tree when executing the state machine. If you've developed something like that, you'll know, like LALR, you can easily parse the entire string to see if he belongs to the set described by the LALR state machine, but you can't get the path to parsing, which means it's hard to get to the analytic tree directly. No tree of analysis must be a tree of syntax. So we have to insert some information into the state machine to eventually construct the parse tree (which is not necessarily really a tree, as the parse path of the previous article, which is actually a possible expression of the analytic tree).

Like the "Construct regular expression engine", the general method of adding information to a state machine is to add some additional data to the jump arrow between state and state. In order to express this thing vividly, I take the first article arithmetic type Zolai example. Here I repeat the contents of this grammar (excluding the tree Book statement) for the convenience of everyone:

 token NAME = "[a-za-z_]/w*"; Token number = "/d+ (./d+)"; Token ADD = "/+"; token SUB = "-"; token MUL = "*"; token
DIV = "//";
Token left = "/(";
Token right = "/)";
    
Token COMMA = ",";
    
Rule numberexpression number = Number:value;
    
Rule functionexpression called = Name:functionname "([exp:arguments {, exp:arguments}]"); Rule Expression Factor =! Number |!
    
Call; Rule Expression Term =!
        Factor;
        = Term:firstoperand "*" Factor:secondoperand as binaryexpression with {binaryoperator = "Mul"};
    
= Term:firstoperand "/" Factor:secondoperand as binaryexpression with {binaryoperator = "Div"}; Rule Expression EXP =!
        Term;
        = Exp:firstoperand "+" Term:secondoperand as binaryexpression with {binaryoperator = "Add"}; = Exp:firstoperand "-" Term:secondoperand as binaryexpression with {binaryoperator = "Sub"}; 

Then we turn this text into a state machine, what to add to the jump? Intuitively, we have six things to do when we jump:

1. Create: The syntax tree node created by this grammar is of a certain type (unlike the one at this moment to create a syntax tree node for what type of return)

2. Set: Sets a specified value for a member variable of the created syntax tree node

3, Assign: To create a syntax tree node of a member variable set this jump symbol generated by the syntax tree node (such as Exp = Exp:firstoperand "+" Term:secondoperand, walk Term, A syntax tree node is assign to the member variable called Secondoperand.

4, using: Use this jump of the symbol generated by the syntax tree node to do the return value of the grammar (such as factor =!). Number |! Caller this article)

5, Shift: slightly

6. Reduce: Slightly

Here we do not mark the beginning of the whole grammar from which non-terminal, because in practice the analyst can actually start with any grammar. When it comes to writing the IDE, for example, we may only need to parse an expression in some cases. So given that every non-terminal is likely to be used, our "token Flow start" and "token end" will appear in every non non-terminal state machine. Therefore, when the first step is to create a epsilon PDA (push machine), it can be directly generated first. Here we take exp as an example:

This column more highlights: http://www.bianceng.cn/Programming/cplus/

The double dashed line represents the end of the token stream and the token stream, which is not what we are concerned about. In the remaining transformations, the implementation is a transformation with input, and a dashed line is a transformation without input (commonly called a epsilon edge).

Here we want to define a concept--analysis path. The analysis path represents how the state jumps when the token is "streaming" over the state machine. So for the actual analysis process, the analytic path is actually an expression form of the analytic tree. In the state machine, the analysis path represents a possible path from the beginning to the end. Here, for example, the analysis path can have three:

$e –> e1–> e2–> e$

$e –> e3–> e8–> e7–> e6–> e5–> e4–> e$

$e –> e9–> e14–> e13–> e12–> e11–> e10–> e$

So we can be clear that there is no multiple create on one path, otherwise you don't know what you should create. Neither the create nor the using can be present at the same time, and the using cannot have multiple. And since create and set are descriptions of the types and attributes of the syntax tree nodes created by this non-terminal (here is exp), it has nothing to do with the timing of their execution, so it doesn't matter where the create and set are in the same parsing path. For example, in the second analysis path above, create is marked in the E6->e5. Even if he moves to the e3->e8, so does the same thing. Anyway, as soon as a path is marked with a create, he is sure to create the specific type of syntax tree node that is specified after the path is determined. This is quite important, because in the latter analysis, it is very likely that we will need to move the create and set to the exact location.

As the previous article said, the next step is to remove the epsilon side. The results are as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.