1. BNF Definition
2. Expression Parsing
3. suffix expression
4. suffix expression to intermediate code
5. intermediate code representation
1. BNF Definition
Although I don't want to mention more theoretical knowledge, some things can't be avoided. When parsing an expression, we must know its BNF definition, which is very convenient for parsing. The so-called BNF definition can be understood at a Glance:
Exp_additive->Exp_multiplicative ("+" | "-") exp_multiplicative
Exp_multiplicative->Exp_cast ("*" | "/" | "%") exp_cast
Exp_cast->...
Meaning:
The addition expression can be expressed as "Multiplication expression + multiplication expression"
Multiplication expression can be expressed as "type conversion expression * Or/or % type conversion expression"
...
Knowing the BNF definition of the entire C language, we can simply parse it according to this definition. For the BNF definition of C, you can view the following link:
Http://lists.canonical.org/pipermail/kragen-hacks/1999-October/000201.html
2. Expression Parsing
Knowing the above BNF definition, then our analysis Code You can write it like this:
Void exp_additive (){
Char op;
Exp_multiplicative ();
While (
(OP = Operator ('+') |
OP = Operator ('-'))){
Get_token ();
Exp_multiplicative ();
...
}
}
Void exp_multiplicative (){
Char op;
Exp_cast ();
While (
(OP = Operator ('*') |
(OP = Operator ('/') |
(OP = Operator ('% '))){
Get_token ();
Exp_cast ();
...
}
}
The process is as follows:
A. When you call exp_additive, you must first call exp_multiplicative.
B. Then, determine whether the backend is + or-. If yes, call exp_multiplicative again.
In this way, the addition expression is parsed. If you have to ask why the expression can be parsed in this way, let's take an example:
A = a * B + C * D;
Then, his syntax tree should be like this:
(Fig. 4.2 syntax tree)
The process of recursive calls is actually the process of constructing the syntax tree. However, we will not really create this syntax tree, but save a form equivalent to it-the suffix expression. In fact, the suffix expression is the subsequent traversal of the syntax tree.
3. suffix expression
What is a suffix expression? Starting from the example, the above expression is converted into a suffix expression like this:
A a B * C D * + =
Why is this strange form written? We are not full. You can see the reason by checking the expression from left to right.
A
A
B
* If the number * is obtained, sum the preceding two variables a B.
C
D
* If the number * is obtained, the sum of the preceding two variables c d is obtained.
+ Get the result of the first two variables a * B c * D, Sum
= Get the = sign and assign the preceding result to.
To generate a suffix expression, we need to transform the above parsing function.
Void exp_additive (){
Char op;
Exp_multiplicative ();
While (
(OP = Operator ('+') |
OP = Operator ('-'))){
Get_token ();
Exp_multiplicative ();
Exp_opr (OP); <-- import operators into the stack
}
}
Void exp_multiplicative (){
Char op;
Exp_cast ();
While (
(OP = Operator ('*') |
(OP = Operator ('/') |
(OP = Operator ('% '))){
Get_token ();
Exp_cast ();
Exp_opr (OP); <-- import operators into the stack
}
}
After the parsing is complete, a suffix expression will be formed in our stack. With the extension form of the expression, we can easily generate the intermediate code of the suffix expression.
4. suffix expression to intermediate code
First, let's explain how our intermediate code is a form. Here we call it a ternary expression, because the form of this intermediate code is fixed. For example, in the preceding example, the expressionA = a * B + C * D;The intermediate code of should eventually look like this:
@ 1 = A * B;
@ 2 = C * D;
@ 3 = @ 1 + @ 2;
@ 4 = A = @ 3;
All the variables starting with @ are generated by us. After the intermediate code is generated, it will be of great help to our subsequent parsing. It should have a fixed structure, so we do not need to parse the sourceProgramInstead, the intermediate code is used to generate the final Execution Code. Here we should first declare that the Execution code I mentioned is not a real executable code, but a sequence of commands that can be parsed by my software. In fact, it is very close to assembly code. However, our goal is to parse and execute without generating assembly code. Therefore, the goal can be achieved by generating a simple command sequence.
We first parse the expression and generate the suffix form to produce this intermediate expression. Expression"A = a * B + C * D ;"The suffix format is"A B * C D * + = ;"The process of generating intermediate code based on this suffix is as follows:
5. intermediate code representation
Typedef struct _ code code_t;
Typedef struct _ code * pcode_t;
Struct _ code {
Char OPR;
Struct {
Int I, n, T;
} Lab;
V_t var [4];
Code_t * next;
};
It is a linked list, and each node stores "@ 1 = a * B ;" . OPR indicates the operator" * "; Lab indicates that the node is a lab, which is left in the subsequent sections. var indicates the operation variable, as shown in the preceding expression "@ 1, A, B" .
In this way, after an expression is parsed, a linked list is generated to indicate the intermediate code of the expression.