Atitti. Syntax Tree AST , suffix expression, DAG, three address codes
The idea of an abstract syntax tree is that any complex statement nesting can be described in the form of a tree. Indeed, it has to be admitted that the application of abstract syntax trees can make sentence translation relatively easy, and it is a good way to describe the relationship between statements and expressions. However, because Neo Pascal does not explicitly construct an abstract syntax tree, it has to be implemented with other data structures. Based on previous experience, the stack structure is not the second choice.
DAG (directed acyclic graph)
Suffix expressions: Also known as inverse Polish expressions, this form is simple and clear and easy to store. When dealing with expression translation, the suffix expression has an incomparable advantage over other forms. However, since the application domain of postfix expressions is relatively single, it is rarely independent as an actual compiler for IR presence.
1.
suffix expression
Edit
does not contain parentheses, operator after two operands, all calculations are performed in the order in which the operators appear, strictly from left to right (no longer consider operator precedence rules, such as:(2 + 1) * 3 , 2 1 + 3 *
Author :: ★ (Attilax) >>>    nickname : old-Wow paw   (   full name:: ATTILAX AKBAR AL RAPANUI&NBSP, Attilax &NBSP, " " Al   Rapa Nui &NBSP,   Kanji name: Ayron, Email:[email protected]
reprint Please indicate source: http://www.cnblogs.com/attilax/
1.1.
prefix notation, infix notation and suffix notation
They are all notation for expressions, so they are also referred to as prefix notation, infix notation, and postfix notation. The difference between them is that the operator is relative to the position of the operand: the operator of the prefix expression precedes the operand associated with it, and the infix and suffix are the same.
Example:
(3 + 4) x5-6 is an infix expression .
-x+ 3 4 5 6 prefix expression
3 4 + 5x6- suffix expression
Infix expression (infix notation)
Infix expression is a general arithmetic or logical formula representation method, the operator is in infix form in the middle of the operand. Infix expression is a commonly used arithmetic representation method.
Although the human brain is easy to understand and analyze infix expression, but to the computer infix expression is very complex, so when calculating the value of an expression, it is usually necessary to first convert infix expression to prefix or suffix expression, and then evaluate. Calculating the value of a prefix or suffix expression is straightforward for a computer.
Three address codes: Also known as "four-tuple", which is the operator and three operand addresses. This is the most common type of IR. Even some books think that IR is the intermediate code (i.e. three address code
The so-called three address, refers to each line of code usually contains three address information, namely the operand 1, the operand 2, the result operand. For example, (add a,1,c) The meaning of this line of three address code is a+1→c. This form is somewhat similar to the assembly language at first
Of course, the three address code is not perfect, because it is relatively discrete, in the analysis of the source program structure, it is less convenient syntax tree
The three address code is for the following reasons: Three the address code is a linear IR. Since both the input source program and the output target program are linear, linear IR has the incomparable advantage of other forms. In addition, programmers often have a sense of intimacy with linear representations relative to other representations, and compiler designers are certainly no exception. Early compiler designers are often the master of assembly language programming, can be very natural, fluent reading linear three address code form. At the same time, the linear representation also reduces the difficulty of the input/output implementation. With the advent of concepts such as "end" and "pass" of the compiler, IR is more than just a data structure stored in memory. Sometimes it also needs to dump the output as a file as an interface for other systems to read and use.
Why is it designed as a "three-address" form? In fact, this is a consensus after years of practical exploration by computer scientists. The three-address code is not the only linear IR, but the most common. In the field of compiling technology, the two address code, single address code (that is, the stack machine code) have appeared, and in some applications, especially in the field of single-address code.
However, the case of single address code is very different, in modern compiler design, the single address Code is also a widely used IR. Especially in recent years, with the growing of mixed language, single address code has also re-entered the people's horizons. The single-address code is popular with mixed-language designers because it is relatively simple to implement a single-address code program, which makes it easy to construct related interpreters or virtual machines. The reader is familiar with the Java bytecode,. Net IL is a single address code
The three-address code is developed on the basis of two address code. The disadvantage of the two-address code is that it usually brings some side effects to one of the source operating components. Of course, the inspiration for this design was originally derived from the x86 command system, but one important difference was forgotten: The x86 directives often use registers as the staging space. The scratch space is a tricky issue for two address code. In order to solve the problem of two address code, people put forward a form that does not have any side effect on the source operation component, that is three address code. That is, in one line of three address code, no operation will change the two source operation components. This is the main difference between the three address code and the two address code. This feature is very important, it will allow the compiler to more freely reuse the name and value, regardless of the side effects of the code.
Finally, let's talk about the level of IR, that is, the degree to which IR relies on the target machine. By level classification, IR can be divided into three categories: Advanced form (HIR), intermediate form (MIR), Low-level form (LIR), also known as Advanced Intermediate Language, intermediate intermediate language, low-level intermediate language.
The Advanced form (HIR) is an IR which preserves the structure of the source language program as much as possible, and this form can keep the original semantic information of the source program well. Because the advanced form is too close to the source language program structure, few compilers pass it independently as IR to the backend.
The intermediate form (MIR) is to reflect the characteristics of the source language to some extent in a language-neutral way, but also to adapt to the IR of many architectures. Intermediate form is a more commonly used IR, which takes into account the characteristics of the source language and target machine, and can be applied to most optimization algorithms. When a compiler designs only one IR, the intermediate form is the preferred choice.
Low-Level form (LIR) is to some extent the IR, which contains some target features, is slightly higher than the target language, and is often used as input to some machine-related optimization algorithms. However, in fact, low-level forms are not very common except for some larger compilers that need to use low-level forms. Because more compiler designers prefer to optimize directly based on the target language.
Table 5-6 Translation Scenarios for IF statements
if statement |
  translate square   case |
if < expression > then statement 1> else statement 2> |
< Expression Translation > (JNT, < expression results >, NULL, __L1) < Statement 1> (JMP, __L2, NULL, NULL) (LABEL, __L1, NULL, NULL) < Statement 2> (LABEL, __L2, NULL, NULL) |
If the Else section is omitted, simply omit the 4th to 6th line of the translation scheme and replace the "__L2" in line 7th with "__l1". The main function of semantic068, semantic069, and semantic070 is to translate the input if statement according to the translation scheme. That is to say, try to rely on these three sub-programs, to complete the translation scheme in the bold statement generation. In the above translation scheme, "__L1" can be called "false Branch label", and "__l2" is called "Export label". In addition, it should be noted that when the input statement is a if-then structure, the label of the 7th Line should not take the export label, but should take the false branch label, because there is no real meaning of the false branch, so you can use the False branch label as the export label.
The translation scheme of the while statement
while Statement |
Translation Solutions |
While < An expression > do < Statement > |
(LABEL, __l1,null,null) < Expression Translation > (JNT,< expression result >,null,__l0) < Statement > (JMP, __l1,null,null) (LABEL, __l0,null,null) |
5.1.2 IR design and its level-51cto.com.html
Atitti. Syntax tree ast, suffix expression, DAG, three address codes