Introduction to abstract syntax trees
(a) Introduction
Abstract syntax Code,ast is a tree representation of the abstract syntax structure of the source code, and each node in the tree represents a structure in the source code, so it is abstract because the abstract syntax tree does not represent every detail of the real syntax, for example, Nested parentheses are hidden in the structure of the tree and are not presented as nodes. Abstract syntax tree does not depend on the syntax of the source language, that is, the grammar analysis stage used in the context of non-text grammar, because in writing grammar, grammar is often the equivalent of the conversion (elimination of left recursion, backtracking, ambiguity, etc.), this will give the grammar analysis to introduce some superfluous components, the subsequent stage adversely affect, It may even make the phase of the process confusing. As a result, many compilers often have to construct a parse tree independently, creating a clear interface for the front end and back end.
Abstract syntax trees are widely used in many fields, such as browsers, smart editors, and compilers.
(ii) Abstract syntax tree instances
(1) Arithmetic expression
Expression: 1+3* (4-1) +2
The abstract syntax tree is:
(2) XML
Code Listing 2.1:
<letter>
<address>
<city>ShiChuang</city>
</address>
<people>
<id>12478</id>
<name>Nosic</name>
</people>
</letter>
Abstract syntax Tree
(3) Procedure 1
Code 2.2
While B! = 0
{
If a > B
A = a-b
Else
b = b-a
}
Return a
Abstract syntax Tree
(4) Procedure 2
Code 2.3
Sum=0
For I in range (0,100)
Sum=sum+i
End
Abstract syntax Tree
(iii) Why abstract syntax trees are needed
When parsing the source program, it is carried out under the guidance of the grammar rules of the corresponding programming language. Grammatical rules describe the composition of the various grammatical components of the language, and the grammatical rules of a programming language can often be accurately described by the so-called contextual unrelated grammars or the equivalent Backus-naur paradigm (BNF). Contextual unrelated grammars are divided into the following categories: LL (1), LR (0), LR (1), LR (k), LALR (1), etc. Each grammar has different requirements, such as LL (1) requires no ambiguity in grammar and no left recursion. When changing a grammar to LL (1) grammar, it is necessary to introduce some grammatical symbols and the production formula.
For example, the syntax of the arithmetic expression is:
Grammar 1.1
E->t| EAT t->f| TMF f-> (E) |i a->+|-m->*|/
Change to LL (1) After:
Grammar 1.2
E->te ' e '->ate ' |e_symbol t->ft ' T '->mft ' |e_symbol f-> (E) |i a->+|-m->*|/
For example, when developing a language, it is possible to start by choosing the LL (1) grammar to describe the grammar rules of the language, the compiler front end generates the LL (1) syntax tree, and the compiler backend processes the LL (1) syntax tree, generating bytecode or assembly code. However, with the development of the project, in the language added more features, with LL (1) grammar description, the feeling is very limited, and writing grammar is very difficult, so this time decided to use the LR (1) grammar to describe the language of the grammar rules, the compiler front-end to create the LR (1) syntax tree, but at this time, You will find it very bad, because the previous compiler backend was processing the LL (1) tree and had to modify the backend code as well.
The first feature of an abstract syntax tree is that it does not depend on specific grammars. Both the LL (1) grammar, LR (1), or other methods require the same syntax tree to be constructed during parsing, which provides a clear, unified interface to the compiler backend. Even if the front-end uses different grammars, it is only necessary to change the front-end code without compromising the backend. That is, reduce the workload, but also improve the maintainability of the compiler.
The second feature of the abstract syntax tree is that it does not depend on the details of the language. In the compiler family, the famous GCC is a big brother, it can compile multiple languages, such as C,c++,java,ada,object C, FORTRAN, Pascal,cobol and so on. In front-end GCC, after lexical, syntactic and semantic analysis of different languages, produce an abstract syntax tree to form an intermediate code as output for the backend processing. To do this, you must construct the syntax tree without relying on the details of the language, for example, in different languages, statements like If-condition-then have different representations
In C for:
if (condition)
{
Do_something ();
}
In Fortran:
If condition Then
Do_somthing ()
End If
When constructing an abstract syntax tree for a if-condition-then statement, you only need to use two branch nodes for the table, one for condition, and one for If_body. The following figure:
parentheses, or keywords, that appear in the source program are discarded.
Http://blog.chinaunix.net/uid-26750235-id-3139100.html