background:
Recent projects need to be finished by themselves Excel formulas are parsed and evaluated, and in Java you can use POI to parse an Excel formula and then evaluate it. However, the project needs to support the analysis of the formula JS and Java back-end, so it is necessary to write a set of their own. In fact, the formula parser is not complex on the whole. The principle is to use inverse Polish expressions.
Difficulty:
1. For complex user input environment analytic formula, need to pay attention to the formula writing nonstandard, uppercase and lowercase, space and so on, even the formula error inference.
2. Need to solve the function extension, function operation and other issues.
3. Address, address range fetch number, evaluation problem need to be resolved.
4. Handle the precedence raised by the parentheses.
5. Solve the problem of nesting knowledge.
6. Financial decimal precision solves the problem of 0.3–0.2! = 0.1 Appearing with the IEEE 754 standard.
7. Solve the circular reference problem, that is, the formula chain loop problem.
theory and principle:
1. Based on inverse Polish expression. Resolves user input to a suffix expression.
2. Abstract operands and operators, operand (operand) operators (operator) are put into two stacks respectively.
3. Abstract the function into operator. The address and address range are abstracted to operand.
on the inverse of Polish expressions, in the data structure and compiling principles have been rotten. Under Introduction:
The inverse Polish expression was an expression expressed by Polish logics in 1929.
Our traditional expression operators are typically between two numbers (the first is limited to the two-dollar operator). Instead, the operator of the inverse Polish expression is behind the number. Here's a simple example to see:
Before conversion: 1 + 2–3 * 4 after conversion: 1, 2, +, 3, 4, *,-
Advantage: Inverse Polish expression is good for machine operation. The elevation of the operator's precedence can be masked by blocking out parentheses.
General algorithm:
The general analytic algorithm of inverse Polish expression is based on simple arithmetic expression, which is the basis of our formula parsing and running:
1. Build two stacks of operand (operand stacks) and operator (Operation Fu Yi).
2. Scan the given string, assuming that a number is obtained, then extract (scan is a one-to-one, be sure to extract a full number) number (replaced with operand), and then push the operand into the operand stack.
3. Assuming that an operator (for example, + or *, the following is substituted with B), it needs to be compared to the top element of the operator stack (substituted with a):
1) If a does not exist, the B is pressed into the operator stack.
2) Assuming that B is an opening parenthesis, ignore the precedence of A and B and press B into the operator stack.
3) Suppose B is a closing parenthesis. Then put the operator stack sequence out of the stack, and then put the popup element sequence into the operand stack, until the top of the stack is the opening parenthesis, brackets are not in the operand stack.
4) Suppose A is an opening parenthesis. Press B directly into the operator stack.
5) Assume that B priority is higher than a. Press B directly into the operator stack.
6) Assume that the priority of B is less than or equal to a priority level. Then push a out of the stack and press it into the operand stack, repeating this step until the top priority of the stack is higher than the priority of B or the top of the stack is a parenthesis.
4. After the scan is complete. The elements of the operator stack are sequentially stacked and then pressed into the operand stack.
algorithm Features:
1. Using two stacks to construct the suffix expression, the operator stack ignores parentheses, always high-priority operator at the top of the stack.
2. The parentheses have the lowest priority, and the precedence is better customized.
3. At the end of the day we only have a operator stack, which can be operated from a low stack.
4. Priority is the key, priority is related to the order of entry and exit stack and finally result.
Complexity of formula parsing:
compared to traditional arithmetic expressions, the parsing of Excel class formulas is more complex:
1. The number of operands and operators that need to be supported: not only the number operand and the math operator, but also the comparison operator, function, logical operand. String operands, addresses (such as a1,a2 for Excel), and so on.
2. The composition of the formula is more complex. Scan Difficulty: The scan need to extract the number (may be scientific notation), words (such as functions, addresses, address range), there may also be a unary operator (for example, take a non!). The need to prevent changes in demand is the most pit of the ape's program.
3. The same operator represents a different meaning: operator overloading is more than the case. For example "-". Can be a "minus", or "symbol", or, for example, when running function functions. The number of operands required by the function. Address range (A1:A10) How the evaluation is run.
4. Calculation accuracy, this is the most troublesome (Java and JS that the pit father's 0.1 problem), both to ensure efficiency, but also to ensure the accuracy (the world is unfair AH).
Summary:
Using inverse Polish expression to parse the formula, we need to make minor changes to the rectification algorithm. Increases the range of operands and operators, not only within the range of arithmetic expressions.
This requires us to write the exact form (or lexical analysis) to extract the complete operand, the operator, and even to overload the operator based on the context (for example, whether it is a subtraction or a symbol). Come here today, tomorrow. Improved algorithms and code.
PS: Reprint please indicate the source.
Formula parser based on inverse Polish expression-algorithm and ideas (i)