Brief introduction
In this article, I'm going to show you how to parse and compute a arithmetic expression like a general-purpose calculator. When we're done, we'll get a calculator that can handle expressions such as 1+2*-( -3+2)/5.6+3 style. Of course, you can also expand it to be more powerful.
My intention is to provide a simple and interesting course to explain grammar analysis and formal grammar (compilation principle content). At the same time, introduce Plyplus, this is a I have been intermittent improved syntax parsing interface for several years. As an added product of this course, we end up with a safe arithmetic that completely replaces eval ().
If you want to try the examples given in this article on your own computer, you should first install Plyplus and use the command pip install Plyplus. (Translator Note: PIP is a package management system, used to install Python-written software packages, the specific use of Baidu or Google, you can not repeat it.) )
This article requires some understanding of Python's inheritance usage.
Grammar
For those who do not understand how to parse and formal grammar work, here is a quick overview: Formal grammar is the rule used to parse some different levels of text. Each rule describes how the corresponding part of the input text is composed.
Here is an example to show how to parse the 1+2+3+4:
Rule #1-add is made of add + number
OR number + number
or with EBNF:
Add:add ' + ' number
| Number ' + ' number
;
Each time the parser looks for add+number or number+number, it finds one and then converts it to add. Basically, the goal of each parser is to find the highest level of expression abstraction possible.
Here are the steps for each parser:
Number + number + number + number
The first conversion converts all number to the "number" rule
[number + number] + number + number
The parser found its first matching pattern!
[Add + number] + number
After converting to a pattern, it starts looking for the next
[Add + number]
Add
These sequential symbols become two simple rules on a level: Number+number and Add+number. This way, you just have to tell the computer that if you solve both of these problems, it can parse the entire expression. In fact, no matter how long the addition sequence, it can be solved! This is the power of formal grammar.
Operator Precedence
Arithmetic expressions are not just linear growth of symbols, operators create an implicit hierarchy, which is well suited for formal grammar:
1 + 2 * 3/4-5 + 6
This is equivalent to:
1 + (2 * 3/4)-5 + 6
We can represent the structure of this syntax through nested rules:
Add:add+mul
| Mul ' + ' Mul
;
Mul:mul ' *; Number
| Number ' * ' number
;
By setting add as the Operation Mul instead of number, we get the rule of multiplication precedence.
Let's simulate in our mind the process of using this magical parser to analyze 1+2*3*4:
Number + number * Number * number
number + [number * Number] * Number
The parser does not know the result of the number+number, so this is another option for it (parser)
number + [Mul * number]
Number + Mul
Now we're having a little trouble! The parser does not know how to handle number+mul. We can distinguish this from the situation, but if we continue to explore, we will find that there are many different things that are not considered possible, such as Mul+number, Add+number, Add+add, etc.
So what should we do?
Fortunately, we can do a little "trick": we can think of a number itself as a product, and a product itself is a sum!
This idea looks a little odd at first, but it does make sense:
Add:add ' + ' Mul
| Mul ' + ' Mul
| Mul
;
Mul:mul ' * ' number
| Number ' * ' number
| Number
;
But if Mul can become an add, and number can become mul, some of the contents of the rows become redundant. Discard them and we get:
Add:add ' + ' Mul
| Mul
;
Mul:mul ' * ' number
| Number
;
Let's use this new syntax to simulate running 1+2*3*4:
Number + number * Number * number
None of the rules now correspond to Number*number, but the parser can be "creative"
number + [NUMBER] * Number * number
number + [Mul * Number] * Number
number + [Mul * number]
[NUMBER] + Mul
[Mul] + mul
[Add + Mul]
Add
It worked!!!
If you think this is fascinating, try to simulate a run with another arithmetic expression and see how the expression solves the problem in the right way. Or wait to read the next section to see how the computer is running!
Run the parser
Now that we have a very good idea of how to make our grammar work, write a practical syntax to apply it:
Start:add; This is the highest level.
Add:add Add_symbol Mul | Mul
Mul:mul Mul_symbol Number | Number
Number: ' [D.] +'; Regular Expressions for decimal numbers
Mul_symbol: ' * ' | ' /';//Match * or/
Add_symbol: ' + ' | ' -';//Match + or-
You may want to review the regular expressions, but in any case, the syntax is straightforward. Let's test it with an expression:
>>>fromplyplusimportgrammar
>>> G=grammar ("" "..." "")
>>>printg.parse (' 1+2*3-5 '). Pretty ()
Start
Add
Add
Add
Mul
Number
1
Add_symbol
+
Mul
Mul
Number
2
Mul_symbol
*
Number
3
Add_symbol
-
Mul
Number
5
Nice work!
Study the tree carefully and see what level the parser chooses.
If you want to run this parser yourself and use your own expressions, you just have to have python. After installing Pip and Plyplus, paste the above command into Python (remember to replace ' ... ' with the actual syntax Oh ~).
Make a tree shape
Plyplus automatically creates a tree, but it is not necessarily optimal. Putting the number into the mul and putting the mul into add is very helpful in creating a hierarchy, and now we have a class that will be a burden to them instead. We tell Plyplus to prefix them to "expand" (i.e. delete) rules.
Encounter a @ often will expand a rule, a # will flatten it, a? will expand when it has a sub-node. In this case, it is what we need.
Start:add;
? add:add Add_symbol Mul | Mul Expand add if it ' s just a mul
? Mul:mul Mul_symbol Number | number;//Expand Mul If it ' s just a number
Number: ' [D.] +';
Mul_symbol: ' * ' | ' /';
Add_symbol: ' + ' | ' -';
In the new syntax the tree is like this:
>>> G=grammar ("" "..." "")
>>>printg.parse (' 1+2*3-5 '). Pretty ()
Start
Add
Add
Number
1
Add_symbol
+
Mul
Number
2
Mul_symbol
*
Number
3
Add_symbol
-
Number
5
Oh, so much more concise, I dare say, it is very good.
Processing of brackets and other characteristics
So far, we are obviously missing some of the necessary features: parentheses, the unit operator (-(1+2)), and the presence of NULL characters in the middle of an expression. In fact, these features are easy to achieve, let's try it out.
You need to first introduce an important concept: atoms. All operations that occur within an atom (in parentheses and in unit operations) take precedence over all addition or multiplication operations (including bitwise operations). Since the atom is just a priority constructor and has no syntactic meaning, add the "@" symbol to make sure it is expandable at compile time.
The simplest way to allow spaces to appear within an expression is to use this interpretation: Add space Add_symbol Space Mul | Mul However, the results are verbose and of poor readability. All, we need to make plyplus always ignore spaces.
The following is the complete syntax, which contains the features described above:
Start:add;
? Add: (Add Add_symbol)? Mul
? Mul: (Mul mul_symbol)? Atom
@atom: Neg | number | ' (' Add ') ';
Neg: '-' atom;
Number: ' [D.] +';
Mul_symbol: ' * ' | ' /';
Add_symbol: ' + ' | ' -';
whitespace: ' [t]+ ' (%ignore);
Make sure you understand this syntax and go to the next step: Calculate!
Operation
Now that we can convert an expression into a hierarchical tree, we only need to scan the tree by branch to get the final result.
Now that we're going to start writing code, I need to do a two-point explanation for this tree:
1. Each branch is an instance that contains the following two attributes:
Header (head): The name of the rule (for example, add or number);
Tail (tail): Contains a list of all the sub-rules that match it.
2.Plyplus removes unnecessary tokens by default. In this example, ' (', ') ' and '-' will be deleted. But add and Mul will have their own rules, and Plyplus will know that they are necessary so that they will not be deleted. If you need to keep these tags, you can turn this feature off manually, but from my experience it's best not to do this, but it's better to manually modify the relevant syntax.
Now, let's start writing code. We will use a very simple converter to scan this tree. It starts scanning from the outermost branch until it reaches the root node, and our job is to tell it how to scan it. If all goes well, it will always start scanning from the outermost layer! Let's take a look at the concrete implementation.
>>>importoperator as Op
>>>fromplyplusimportstransformer
Classcalc (Stransformer):
Def_bin_operator (Self, exp):
Arg1, Operator_symbol, Arg2=exp.tail
operator_func={' + ': Op.add,
'-': op.sub,
' * ': Op.mul,
'/': Op.div}[operator_symbol]
Returnoperator_func (Arg1, arg2)
Number =lambdaself, Exp:float (Exp.tail[0])
Neg =lambdaself, exp:-exp.tail[0]
__default__=lambdaself, Exp:exp.tail[0]
Add=_bin_operator
Mul=_bin_operator
Each method corresponds to a rule. If the method does not exist, the __default__ method is called. We omitted Start,add_symbol and Mul_symbol, as they would only return their branches.
I used float () to parse the numbers, which is a lazy way, but I can also use a parser to implement.
To keep the statement neat, I used the operator module. For example, add is basically ' lambda x,y:x+y '.
OK, now let's run this code to check the results.
>>> Calc (). Transform (G.parse (' 1 + 2 *-( -3+2)/5.6 + 30 '))
31.357142857142858
What about eval ()? 7
>>>eval (' 1 + 2 *-( -3+2)/5.6 + 30 ')
31.357142857142858
Succeeded:)
Last step: REPL
For the sake of beauty, we wrap it up in a nice calculator REPL:
Defmain ():
Calc=calc ()
Whiletrue:
Try
S=raw_input (' > ')
Excepteoferror:
Break
ifs== ':
Break
Tree=calc_grammar.parse (s)
Printcalc.transform (tree)
The complete code can be obtained from here:
https://github.com/erezsh/plyplus/blob/master/examples/calc.py