background:
Recent projects need to do their own parsing and evaluation of Excel formulas, and in Java you can use POI to parse Excel formulas and then evaluate them. However, the project needs to support the parsing of the formula in both the JS and Java backend, so it is necessary to write a set. In fact, the formula parser is generally not complex, the principle of using inverse Polish expression can be.
Difficulty:
1. For complex user input environment analytic formula, need to pay attention to the formula writing nonstandard, case, space and so on, even the formula error judgment.
2. Need to solve the function extension, function execution and other issues.
3. Address, address range fetch, evaluation problem need to be resolved.
4. Handle the precedence raised by the parentheses.
5. Solve the problem of nesting knowledge.
6. Financial decimal Precision to resolve the 0.3–0.2! = 0.1 issue that occurs with the IEEE 754 standard.
7. Solve the circular reference problem, that is, the formula chain loop problem.
theory and principle:
1. The user input is parsed as a suffix expression based on the inverse Polish expression.
2. Abstract operands and operators, operand (operand) operators (operator) are put into two stacks respectively.
3. Abstract the function into operator, and the address and address range are abstracted to operand.
on the inverse of Polish expression, in the data structure and compiling principles have been rotten, briefly introduced:
The inverse Polish expression was an expression expressed by Polish logics in 1929. Our traditional expression operators are typically between two numbers (the first is limited to the two-dollar operator). Instead, the operator of the inverse Polish expression is behind the number. Let's look at a simple example to find out:
Before conversion: 1 + 2–3 * 4 after conversion: 1, 2, +, 3, 4, *,-
Pros: Inverse Polish expressions are ideal for machine execution and can mask the elevation of the operator's precedence by blocking out parentheses.
General algorithm:
The general analytic algorithm of inverse Polish expressions is based on simple arithmetic expressions, which is the basis for our formula parsing and execution:
1. Build two stacks of operand (operand stacks) and operator (Operation Fu Yi).
2. Scan the given string, if you get a number, extract (scan is a one, be sure to extract a full number) number (the following is replaced with operand), and then put the operand into the operand stack.
3. If you get an operator (such as + or *, with B instead), you need to compare it to the top element of the operator stack (in a substitution):
1) If a does not exist, the B is pressed into the operator stack;
2) If B is an opening parenthesis, then the priority comparison of A and B is ignored, and B is pressed into the operator stack.
3) If B is a closing parenthesis, then stack the operator stack sequentially, then press the popup element order into the operand stack until the top of the stack pops up with an opening parenthesis, and the brackets are not in the operand stack.
4) If A is an opening parenthesis, press B directly into the operator stack.
5) If B priority is compared to a high, press B directly into the operator stack.
6) If B priority is less than or equal to the priority of a, then put a out stack and then press into the operand stack, repeat this step until the top of the stack priority is higher than the priority of B or the top of the stack is a parenthesis.
4. After the scan is complete, put the elements of the operator stack out of the stack sequentially, then press into the operand stack.
algorithm Features:
1. Using two stacks to construct the suffix expression, the operator stack ignores parentheses, always high-priority operator at the top of the stack.
2. The parentheses have the lowest priority, and the precedence is defined.
3. In the end we only have a operator stack, from the stack low sequence.
4. Priority comparisons are key, and priorities relate to the order of entry and exit stacks and the final result.
Complexity of formula parsing:
compared to traditional arithmetic expressions, the parsing of Excel class formulas is more complex:
1. There are many operands and operators to support: not just numeric operands and mathematical operators, but also comparison operators, functions, logical operands, string operands, addresses (such as a1,a2 in Excel), and so on.
2. The composition of the formula is more complex, the scanning difficulty: the scanning need to extract numbers (may be scientific notation), words (such as functions, addresses, address range), there may be a unary operator (such as take non!), the need to prevent changes in demand is the most pit programmer.
3. The same operator represents different meanings: operator overloading is compared to, for example, "-", which may be a "minus" or "symbol", or, for example, how many operands the function requires when executing function functions, and how the address range (A1:A10) performs the evaluation.
4. Calculation accuracy, this is the most troublesome (Java and JS that the pit father's 0.1 problem), both to ensure efficiency, but also to ensure the accuracy (the world is unfair AH).
Summary:
Using the inverse Polish expression to parse the formula, we need to make minor changes to the rectification algorithm, increasing the range of operands and operators, not just within the range of arithmetic expressions. This requires us to write a regular expression (or lexical analysis) to extract the full operand, the operator, or even to overload the operator based on the context (for example, whether it is a subtraction or a symbol). Come here today, tomorrow. Improved algorithms and code.
PS: Reprint please indicate the source.
Formula parser based on inverse Polish expression-algorithm and ideas (i)