Brief introduction
In this article, I'll show you how to parse and compute a arithmetic expression like a general-purpose calculator. When we're done, we'll get a calculator that can handle expressions like 1+2*-( -3+2)/5.6+3 style. Of course, you can also expand it more powerfully.
My intention is to provide a simple and interesting course to explain grammar analysis and formal grammar (compiler principle content). At the same time, introduce Plyplus, which is a syntax parsing interface that I have been improving on and off for several years. As an add-on to this course, we end up with a secure arithmetic that can replace the eval ().
If you want to try the example given in this article on your own computer, you should first install Plyplus and use the command pip install Plyplus. (Translator Note: PIP is a package management system, used to install a software package written in Python, the specific use of the method can be Baidu or Google, do not repeat. )
This article requires some understanding of the use of Python inheritance.
Grammar
For those of you who don't know how to parse and do formal grammar work, here's a quick overview: Formal grammar is a different layer of rules for parsing text. Each rule describes how the corresponding part of the input text is composed.
Here is an example to show how to parse 1+2+3+4:
Rule #1-add is MADE of add + number
OR number + number
or use EBNF:
Add:add ' + ' number
| number ' + ' number
;
The parser looks for add+number or number+number each time and then converts it to add after it finds one. Basically, the goal of each parser is to find the highest level of expression abstraction as much as possible.
The following are each steps of the parser:
Number + number + number + number
The first conversion changes all number to "number" rule
[number + number] + number + number
The parser found its first matching pattern!
After converting to a pattern, it starts looking for the next
These sequential symbols turn into two simple rules at one level: Number+number and Add+number. This way, you just need to tell the computer that if you solve both of these problems, it can parse the entire expression. In fact, no matter how long the addition sequence, it can be solved! This is the power of formal grammar.
Operator Precedence
Arithmetic expressions are not just linear growth of symbols, operators create an implicit hierarchy, which is well suited to be represented by formal grammars:
This is equivalent to:
We can represent the structure in this syntax through nested rules:
Add:add+mul
| mul ' + ' mul
;
Mul:mul ' *; Number
| number ' * ' number
;
By setting add to Operation Mul instead of number, we get the rule of multiplication precedence.
Let's simulate the process of using this magical parser to analyze 1+2*3*4 in our mind:
Number + number * Number * number number
+ [number * Number] * Number
The parser does not know the result of the number+number, so this is another option for it (parser)
number + [Mul * Number] number
+ mul
???
Now we have a little trouble! The parser does not know how to handle number+mul. We can tell the difference, but if we continue to explore it, we will find that there are many different things that are not considered possible, such as Mul+number, Add+number, Add+add, etc.
So what should we do?
Fortunately, we can do a little "trick": we can think of a number itself as a product, and a product itself is a sum!
The idea looked a little odd at first, but it did make sense:
Add:add ' + ' mul
| mul ' + ' mul
| mul
;
Mul:mul ' * ' number
| number ' * ' Number
|
But if Mul can become add, and number can become Mul, the contents of some rows become redundant. Throw them away and we get:
Add:add ' + ' mul
| mul
;
Mul:mul ' * ' number
|
Let's use this new syntax to simulate a run of 1+2*3*4:
Number + number * Number * number
None of the rules now correspond to Number*number, but the parser can be "creative"
number + [NUMBER] * Number * number number
+ [mul * Number] * Number number
+ [mul * number]
[number] + Mul
[Mul] + mul
[Add + mul]
Add
It worked!!!
If you find this fascinating, try using another arithmetic expression to simulate a run, and see how the expression solves the problem in the right way. Or wait to read the next section to see how the computer is running out of the steps!
Run parser
Now that we have a very good idea of how to make our grammar work, write a practical syntax to apply it:
Copy Code code as follows:
Start:add; This is the highest level.
Add:add Add_symbol Mul | Mul
Mul:mul Mul_symbol Number | Number
Number: ' [D.] +'; Regular Expressions for decimal numbers
Mul_symbol: ' * ' | ' /'//Match * or/
Add_symbol: ' + ' | ' -'//Match + or-
You may want to review regular expressions, but in any case, this syntax is very straightforward. Let's test it with an expression:
>>>fromplyplusimportgrammar
>>> G=grammar ("" "..." ")
>>>printg.parse (' 1+2* 3-5 '). Pretty ()
start
add
add
mul number
1
add_symbol
+
Mul
mul
number
2
mul_symbol
*
number
3
add_symbol
-
mul
Number
5
Good job!
Take a closer look at the tree and see what level the parser chooses.
If you want to run the parser yourself and use your own expression, you just have to have python. After you install Pip and Plyplus, paste the above command into Python (remember to replace ' ... ' with the actual syntax Oh ~).
Make the tree take shape
Plyplus will automatically create a tree, but it is not necessarily optimal. Putting number into mul and putting mul into add is very helpful in creating a class, and now we have a class that they will become a burden. We told Plyplus to prefix them with the "expand" (i.e. delete) rule.
Encounter a @ often expands a rule, a # will flatten it, one? will expand when it has a child node. In this case,? is what we need.
Start:add;
? Add:add Add_symbol Mul | Mul; Expand add if it's just a mul
mul:mul mul_symbol number | number;//Expand if it's mul a number
: ' [D.] +';
Mul_symbol: ' * ' | ' /';
Add_symbol: ' + ' | ' -';
Under the new syntax the tree is like this:
>>> G=grammar ("" "..." "")
>>>printg.parse (' 1+2*3-5 '). Pretty ()
start
add
add
number
1
add_symbol
+
mul
number
2
mul_symbol
*
number
3
add_symbol
-
number
5
Oh, it's so much simpler, I dare say, it's very good.
Handling of parentheses and other characteristics
So far, we are obviously missing some of the necessary features: parentheses, cell operators (-(1+2)), and the presence of NULL characters in the middle of an expression. In fact, these features are very easy to achieve, let's try it.
First, we need to introduce an important concept: atoms. All operations that occur within an atom (in parentheses and in cell operations) take precedence over all additions or multiplication operations (including bitwise operations). Since the atom is just a priority constructor and has no grammatical meaning, help us to add the "@" symbol to ensure that it is able to unfold at compile time.
The easiest way to allow spaces to appear in an expression is to use this explanation: Add Add_symbol space Mul | Mul But an explanation results in verbose and poor readability. All, we need to make plyplus always ignore spaces.
The following is a complete syntax that embraces the attributes described above:
Start:add;
? Add: (Add Add_symbol)? Mul;
? Mul: (Mul mul_symbol)? Atom;
@atom: Neg | number | ' (' Add ') ';
Neg: '-' atom;
Number: ' [D.] +';
Mul_symbol: ' * ' | ' /';
Add_symbol: ' + ' | ' -';
whitespace: ' [t]+ ' (%ignore);
Make sure that you understand this syntax and move on to the next step: Calculate!
Operation
Now, we can turn an expression into a layered tree, and we can get the final result by scanning the tree branch by line.
We're going to start coding now, and before that, I need to do a two-point explanation of this tree:
1. Each branch is an instance that contains the following two attributes:
- Head: The name of the rule (e.g. add or number);
- Tail (tail): Contains a list of all the child rules it matches.
2.Plyplus deletes unnecessary tags by default. In this case, ' (', ') ' and '-' will be deleted. But add and Mul will have their own rules, and plyplus will know they are necessary so that they will not be deleted. If you need to keep these tags, you can manually turn off this feature, but from my experience, it's best not to do this, but to manually modify the related syntax better.
Now we're going to start writing code. We'll use a very simple converter to scan the tree. It will start scanning from the outermost branch until it reaches the root node, and our job is to tell it how to scan. If all goes well, it will always start scanning from the outermost layer! Let's look at the concrete implementation.
>>>importoperator as op
>>>fromplyplusimportstransformer
Classcalc (STransformer):
Def_bin_operator (Self, exp):
arg1, Operator_symbol, arg2=exp.tail
operator_func={' + ': op.add,
'-': Op.sub,
' * ': Op.mul,
'/': Op.div}[operator_symbol]
returnoperator_func (arg1, arg2)
number = Lambdaself, Exp:float (exp.tail[0])
neg =lambdaself, exp:-exp.tail[0]
__default__=lambdaself, Exp: Exp.tail[0]
add=_bin_operator
mul=_bin_operator
Each method corresponds to a rule. If the method does not exist, the __default__ method is invoked. We omitted Start,add_symbol and Mul_symbol, because they would only return their own branches.
I used float () to parse the numbers, which is a lazy way, but I can do it with a parser.
To keep the statement clean, I used the operator module. For example, add is basically ' lambda x,y:x+y ' or something like that.
OK, now let's run this code to check the results.
>>> Calc (). Transform (G.parse (' 1 + 2 *-( -3+2)/5.6 +))
31.357142857142858
What about eval ()? 7
>>>eval (' 1 + 2 *-( -3+2)/5.6 + ')
31.357142857142858
Succeeded:)
Last step: REPL
For the sake of beauty, we encapsulate it into a nice calculator REPL:
Defmain ():
calc=calc ()
whiletrue:
try:
s=raw_input (' > ')
excepteoferror:
break
ifs== ': Break
Tree=calc_grammar.parse (s)
printcalc.transform (tree)
The complete code can be obtained from here:
https://github.com/erezsh/plyplus/blob/master/examples/calc.py