I used to write a tutorial for Parser Combinator. To deal with the newly designed managed edX hosting language of Vczh Library ++, I added three new combinations for Parser Combinator.
The first is def, and the second is let. They are used in combination. Def (pattern, defaultValue) means that if pattern succeeds, the analysis structure of pattern is returned; otherwise, the defaultValue is returned. Let (pattern, value) means that if pattern succeeds, value is returned; otherwise, it fails. Therefore, they can be used together. For example, ManagedX and C # Have five types of accessor: public, protected, protected internal, private, and internal. The syntax type of the four accessor types is token, and the rest of protected internal is tuple <token, token>. Therefore, we cannot easily write a token for it to the Conversion Function of the syntax tree. In addition, it is also difficult to directly express private behaviors on EBNF + handler by default. When def and let do not exist, we need to write:
Accessor = (PUBLIC [ToAccessor] | PROTECTED [ToAccessor] | PRIVATE [ToAccessor] | INTERNAL [ToAccessor] | (PROTECTED + INTERNAL) [ToProtectedInternal]) [ToAccessorWithDefault];
In this case, we need to create three functions: ToAccessor, ToProtectedInternal, and ToAccessorWithDefault. Because the accessor itself is not an important syntax element, we do not need to record some source code location information for the accessor. The expression requires location information, so that we can know the location where the error occurs in the source code when an error message is generated. Accessor is always an important syntax element, so it does not need to be saved. If you do not need to save location information, a ToXXX function is unnecessary. In this case, def and let can be used to simplify the operation:
Accessor = def (let (PUBLIC, acc: Public) | let (PROTECTED, acc: Protected) | let (PRIVATE, acc: Private) | let (INTERNAL, acc:: Internal) | let (PROTECTED + INTERNAL, acc: ProtectedInternal), acc: Private );
It seems similar, but we have actually reduced the three unnecessary functions.
======================================================= Shameless line ======================================
The third is binop. This is mainly because the general lrec (left recursive combination) has poor performance when dealing with expressions with a large number of parentheses. Here, I will explain the reason. Assume that our language has four operators:>, +, *, and (). The syntax is generally written as follows:
Exp0 = NUMBER | (exp3)
Exp1 = exp1 * exp0 | exp0
Exp2 = exp2 + exp1 | exp1
Exp3 = exp3> exp2 | exp2
Therefore, it is easy to know that when we analyze 1*2*3, we follow the following path:
Exp3
= Exp2
= Exp1
= Exp1 * exp0
= Exp1 * exp1 * exp0
= 1*2*3
Now let's make a simple transformation and change 1*2*3 to (1*2) * 3 ). The meaning remains unchanged, but the analysis path has completely changed:
Exp3
= Exp2
= Exp1
= Exp0
= (Exp3)
= (Exp2)
= (Exp1)
= (Exp1 * exp0)
= (Exo0 * exp0)
= (Exp3) * exp0)
= (Exp2) * exp0)
= (Exp1) * exp0)
= (Exp1 * exp0) * exp0)
= (Exp0 * exp0) * exp0)
= (1*2) * 3)
At first glance, there seems to be no difference, but for ManagedX, a language with more than a dozen operators with priority, if every node of a complex expression is added with parentheses, it is equivalent to a recursive analysis of thousands of layers of grammar. Since Parser Combinator is a recursive downward analyzer, the path is so long that the recursive hierarchy will be so long. To avoid the ultra-slow Compilation speed problem of boost: Spirit, we sacrificed a little bit of performance and made the Parse function of the combination into a virtual function, therefore, the Compilation speed is greatly improved. Generally, a boost: Spirit syntax analyzer that needs to be compiled for an hour and a half can be compiled using my library in just a few seconds. But now there are problems. There are many parentheses, and the performance decline is obvious. However, we obviously cannot discard food for some reason, so I decided to provide Parser Combinator with a hand-written syntax analyzer with a priority plus or minus operators. In order to insert this handwritten analyzer into the framework and become universal, I decided to adopt the following structure. The following code is taken from the syntax analyzer of ManagedX: 1 expression = binop (exp0)
2. pre (ADD_SUB, ToPreUnary). pre (NOT_BITNOT, ToPreUnary). pre (INC_DEC, ToPreUnary). precedence ()
3. lbin (MUL_DIV_MOD, ToBinary). precedence ()
4. lbin (ADD_SUB, ToBinary). precedence ()
5. lbin (LT <LT, ToBinaryShift). lbin (GT> GT, ToBinaryShift). precedence ()
6. lbin (LT, ToBinary). lbin (LE, ToBinary). lbin (GT, ToBinary). lbin (GE, ToBinary). precedence ()
7. post (AS + type, ToCasting). post (IS + type, ToIsType). precedence ()
8. lbin (EE, ToBinary). lbin (NE, ToBinary). precedence ()
9. lbin (BITAND, ToBinary). precedence ()
10. lbin (XOR, ToBinary). precedence ()
11. lbin (BITOR, ToBinary). precedence ()
12. lbin (AND, ToBinary). precedence ()
13. lbin (OR, ToBinary). precedence ()
14. lbin (QQ, ToNullChoice). precedence ()
15. lbin (QT + (expression <COLON (NeedColon), ToChoice). precedence ()
16. rbin (OPEQ, ToBinaryEq). rbin (EQ, ToAssignment). precedence ()
17;
The parameters of the binop combination sub-expression represent the combination of the highest-priority expressions with the highest priority (refer to the above> + * () syntax, you can know what exp0 means here ). Binop provides four child combinations, namely pre (prefix unary operator), post (suffix unary operator), lbin (left Union binary operator), and rbin (right Union binary operator ). Precedence indicates that the definition of all operators with a given priority ends. Here I made a small limit, that is, each precedence can only contain one of pre, post, lbin, and rbin. Practice shows that such restrictions will not cause any problems. So here we get a table of relations between operators and priorities. Here we can write a handwritten syntax analyzer (download the source code and open LibraryCombinator \ _ Binop. h) under the framework of Parser Combinator. As for how to hand-Write the syntax analyzer, I have provided an article. You can refer to this article to read _ Binop. h.
Compared with simple lrec, binop improves the performance by more than 100 times in debug, while it does less in release. Here, Parser Combinator meets the performance requirements again. We can rest assured that we can use a little bit of indifferent performance in exchange for more than one thousand times of Compilation Time. Here I will post the implementation of the operator syntax provided by lrec when binop is not available: 1 exp1 = exp0
2 | (ADD_SUB | NOT_BITNOT | INC_DEC) + exp1) [ToUnary]
3;
4
5 exp2 = lrec (exp1 + * (MUL_DIV_MOD + exp1) [ToBinaryLrec]), ToLrecExpression );
6 exp3 = lrec (exp2 + * (ADD_SUB + exp2) [ToBinaryLrec]), ToLrecExpression );
7 exp4 = lrec (