This article describes how to use only 50 lines of code to implement a calculator written in Python. it mainly uses the PlyPlus library to make the core code very simple. For more information, see

**Introduction**
In this article, I will show you how to parse and calculate a four arithmetic expression just like a general calculator. When we end, we will get a calculator that can process expressions such as 1 + 2 *-(-3 + 2)/5.6 + 3. Of course, you can also expand it more powerful.

I intended to provide a simple and interesting course to explain syntax analysis and regular syntax (compilation principles ). At the same time, let's introduce PlyPlus, which is an interface that I intermittently improved for several years. As an additional product of this course, we will finally get a secure four-timer that completely replaces eval.

If you want to try the example given in this article on your computer, you should first install PlyPlus and use the command pip install plyplus. (Note: pip is a package management system used to install software packages written in python. you can use it either baidu or google .)

This article needs to understand the use of python inheritance.

**Syntax**

For those who do not understand how to parse and work with formal syntax, here is a quick overview: formal syntax is used to parse different levels of text rules. Each rule describes how the text of the corresponding part of the input is made up.

Here is an example of how to parse 1 + 2 + 3 + 4:

Rule #1 - add IS MADE OF add + number OR number + number

Or use EBNF:

add: add'+'number | number'+'number ;

The parser searches for add + number or number + number every time. after finding one, it converts it to add. Basically, the goal of each parser is to find the highest level of expression abstraction as much as possible.

Each step of the parser is as follows:

number + number + number + number

The first conversion converts all numbers into "Number" rules.

[number + number] + number + number

The parser finds its first matching mode!

[add + number] + number

After converting to a mode, it starts to look for the next

[add + number] add

These ordered symbols become two simple rules at a level: number + number and add + number. In this way, you only need to tell the computer that if the two problems are solved, it can parse the entire expression. In fact, no matter how long the addition sequence can be solved! This is the power of formal grammar.

Operator priority

Arithmetic expressions are not just linear growth of symbols. operators create an implicit hierarchy, which is very suitable for representation using formal syntax:

1 + 2 * 3 / 4 - 5 + 6

This is equivalent:

1 + (2 * 3 / 4) - 5 + 6

We can use nested rules to indicate the structure in this syntax:

add: add+mul | mul'+'mul ;mul: mul '*; number | number'*'number ;

By setting add as an operation mul rather than a number, we get a multiplication priority rule.

Let's simulate the process of using this magic parser to analyze 1 + 2*3*4:

number + number * number * number number + [number * number] * number

The parser does not know the result of number + number, so this is another option for the parser.

number + [mul * number] number + mul ???

Now we have encountered some difficulties! The parser does not know how to handle number + mul. We can distinguish this situation, but if we continue to explore, we will find that there are many different possibilities, such as mul + number, add + number, add + add, and so on.

So what should we do?

Fortunately, we can do a little trick: we can think that a number is a product, and a product is a sum!

This idea looks a bit odd at the beginning, but it does make sense:

add: add'+'mul | mul'+'mul | mul ;mul: mul'*'number | number'*'number | number ;

However, if mul can be changed to add and the number can be changed to mul, the content of some rows will become redundant. Discard them and we will get:

add: add'+'mul | mul ;mul: mul'*'number | number ;

Let's use this new syntax to simulate 1 + 2*3*4:

number + number * number * number

Currently, no rule corresponds to number * number, but the parser can "become creative"

number + [number] * number * number number + [mul * number] * number number + [mul * number] [number] + mul [mul] + mul [add + mul] add

SUCCESS !!!

If you think this is amazing, try to use another arithmetic expression to simulate and run it, and then see how the expression solves the problem step by step in the correct way. Or wait to read the content in the next section and see how the computer runs step by step!

**Run the parser**

Now we have a very good idea about how to make our syntax work, so let's write a practical syntax to apply it:

__ __The code is as follows:

Start: add; // This is the highest level

Add: add add_symbol mul | mul;

Mul: mul mul_symbol number | number;

Number: '[d.] +'; // regular expression of the decimal number

Mul_symbol: '*' | '/'; // Match * or/

Add_symbol: '+' | '-'; // Match + or-

You may want to review regular expressions, but in any case, this syntax is very straightforward. Let's test it with an expression:

>>>fromplyplusimportGrammar>>> g=Grammar("""...""")>>>printg.parse('1+2*3-5').pretty()start add add add mul number 1 add_symbol + mul mul number 2 mul_symbol * number 3 add_symbol - mul number 5

Pretty good!

Take a closer look at this tree to see what level the parser chooses.

If you want to run the parser and use your own expressions, you only need to have Python. After installing Pip and PlyPlus, paste the preceding command into Python (replace '...' with the actual syntax ~).

**Build trees**

Plyplus automatically creates a tree, but it is not necessarily optimal. Adding numbers to mul and adding mul to add is very helpful for creating a class. now we already have a class which will become a burden. We told Plyplus to prefix them to "expand" (I. e. delete) rules.

When a @ is used to develop a rule, a # rule is flattened, and? It is expanded when it has a child knot. In this case ,? Is what we need.

start: add;?add: add add_symbol mul | mul; // Expand add if it's just a mul?mul: mul mul_symbol number | number;// Expand mul if it's just a numbernumber:'[d.]+';mul_symbol:'*'|'/';add_symbol:'+'|'-';

In the new syntax, the tree is like this:

>>> g=Grammar("""...""")>>>printg.parse('1+2*3-5').pretty()start add add number 1 add_symbol + mul number 2 mul_symbol * number 3 add_symbol - number 5

Oh, this is much more concise, I dare say, it is very good.

Brackets and other features

So far, we have obviously lacked some necessary features: parentheses, cell operators (-(1 + 2), and expressions that allow null characters. In fact, these features can be easily implemented. let's try it.

An important concept should be introduced first: atom. All operations in an atom (in parentheses and in unit operations) take precedence over all addition or multiplication operations (including bit operations ). Since the atom is only a priority constructor and has no syntax significance, it adds the "@" symbol to ensure that it can be expanded during compilation.

The simplest way to allow spaces to appear in an expression is to use this Interpretation Method: add SPACE add_symbol SPACE mul | mul; however, the interpretation results are too long and the readability is poor. All, we need to make Plyplus always ignore spaces.

The following is a complete syntax that covers the features described above:

start: add;?add: (add add_symbol)? mul;?mul: (mul mul_symbol)? atom;@atom: neg | number |'('add')';neg:'-'atom;number:'[d.]+';mul_symbol:'*'|'/';add_symbol:'+'|'-';WHITESPACE:'[ t]+'(%ignore);

Make sure you understand this syntax before proceeding to the next step: computing!

Operation

Now, we can convert an expression into a layered tree. we only need to scan the tree branch by branch to get the final result.

Now we are writing code. Before that, I need to explain this tree in two ways:

1. each branch is an instance with the following attributes:

- Header: Rule name (such as add or number );
- Tail: contains a list of all matched sub-rules.

2. Plyplus removes unnecessary tags by default. In this example, '(', ')' and '-' are deleted. But add and mul have their own rules. Plyplus will know that they are required and will not be deleted. If you need to retain these tags, you can manually turn off this feature, but in my experience, it is better not to do this, but to manually modify the relevant syntax.

Now let's write the code. We will use a very simple converter to scan this tree. It starts scanning from the outermost branch until it reaches the root node, and our job is to tell it how to scan. If everything goes well, it will always start scanning from the outermost layer! Let's take a look at the specific implementation.

>>>importoperator as op>>>fromplyplusimportSTransformer classCalc(STransformer): def_bin_operator(self, exp): arg1, operator_symbol, arg2=exp.tail operator_func={'+': op.add, '-': op.sub, '*': op.mul, '/': op.p }[operator_symbol] returnoperator_func(arg1, arg2) number =lambdaself, exp:float(exp.tail[0]) neg =lambdaself, exp:-exp.tail[0] __default__=lambdaself, exp: exp.tail[0] add=_bin_operator mul=_bin_operator

Each method corresponds to a rule. If the method does not exist, the _ default _ method is called. We omit start, add_symbol, and mul_symbol, because they only return their own branches.

I use float () to parse numbers. this is a lazy method, but I can also use a parser.

To make the statements clean and tidy, I use the operator module. For example, add is basically 'lambda x, y: x + y.

OK. Now run this code to check the result.

>>> Calc().transform( g.parse('1 + 2 * -(-3+2) / 5.6 + 30'))31.357142857142858

What about eval? 7

>>>eval('1 + 2 * -(-3+2) / 5.6 + 30')31.357142857142858

Successful :)

Last step: REPL

To be beautiful, we encapsulate it into a good calculator REPL:

defmain(): calc=Calc() whileTrue: try: s=raw_input('> ') exceptEOFError: break ifs=='': break tree=calc_grammar.parse(s) printcalc.transform(tree)

Complete code can be obtained from here:

Https://github.com/erezsh/plyplus/blob/master/examples/calc.py