This note is my understanding of the content of the 2.3-2.5 section of Dragon book, which mainly introduces the many basic concepts of the compiler front-end about parsing. The next note will be based on the main content of this note , the implementation of a simple expression of the suffix Syntax Translator Demo (the original book is a Java instance, I will give a logical consistent version of the Python implementation).
1. Parsing (Syntax analysis)
In simple terms, the task of parsing is to parse the input symbol string (string of Symbols, usually the tokens produced by lexical analysis, follows the rules given by a language in its context-independent grammar (Context-free grammar), where rules are given by a series of production (production).
If the analysis succeeds, it constructs an abstract syntax tree for the program (Abstract Syntax tree http://en.wikipedia.org/wiki/Abstract_syntax_tree), otherwise it throws something like " Syntax error "bug prompt. The
program that completes the parsing is referred to as parser in terms of the task of whether the token streams of the input can be deduced from the starting symbol of the grammar of the language and how it is deduced.
In principle parser must have the ability to construct a parse tree for the input string (although in fact, the compiler does not actually construct the full parse tree when implementing parsing, it constructs a more concise abstract syntax tree). The
Paser has two common implementations:
1) top-down parsing
Top-down type parser constructs the parsing tree from the root node, gradually to the leaf node direction. This method makes it easier to manually construct an efficient parser. The
common terms associated with this type of parser are inherited attributes and Ll parser.
2) bottom-up parsing
The bottom-up method starts from the leaf node and constructs the root node gradually. This approach can handle a wider range of grammar and translation plans, so software tools that generate the parser directly from the grammar usually employ bottom-up methods. The
common terms associated with this type of parser are synthesized attributes and LR parser.
for a summary of current industry-parsers (such as Ll-parser/lr-parser), refer to Wikipedia's parser entry.
2. Grammar-guided translation (syntax-directed translation, SDT)
grammar-guided translation refers to a conversion process, which is intended to translate the input string into a series of semantic actions, which are created by adding rules or program fragments to the grammar to achieve. SDT provides an easy way to attach semantic actions to grammars (semantic actions).
SDT has two important related concepts: grammar guidance definition (syntax-directed definition, SDD) and grammar-guided translation program (syntax-directed translation Scheme, SDTS).
2.1 Syntax Guidance definition (SDD)
The SDD is a special context-free grammar, which is unique in that it attaches attributes (such as the data type of the symbol, number of instructions, location of instructions, etc.) to each grammatical symbol (grammar symbols). and the semantic rules (semantic rules) are added to the production of these grammatical symbols, where the semantic rules define the rules for evaluating the values of the associated attributes of the symbols in the production form. The following is an example of the syntax guidance definition, which appends the semantic rules to the right for the production on the left.
2.2 Grammar Guidance Translation program (SDTS)
Sdts is also a special kind of context-free grammar, which is unique in that it embeds program fragments called semantic actions (semantic actions) in the production body. Grammar-guided translation schemes are similar to grammatical guidance definitions, but explicitly specify the order in which semantic rules are evaluated. The following is an example of a grammar-guided translation plan, which embeds semantic actions in the production.
The difference and connection between 2.3 SDD and Sdts
Stanford a CS course on the lecture notes of Dragon book The differences between the two concepts are described as:
SDDs is useful for specifying translations.
SDTSS is useful for implementing translations.
That is, the grammar guidance definition (SSDS) specifies the semantic rules attached to the production, while the grammar-guided translation scheme (SDTSS) implements these semantic actions at the right time, usually during the token stream of the input derived from the generated formula.
According to Wikipedia's introduction to syntax-directed translation, the SDD is easiest to implement when using the Bottom-up parsing method and the SDD using the s-attributed grammar. In this case, a grammar-guided translation plan can be constructed by embedding the semantic action at the far right of the production body, which is executed when the production is deduced. This embedding of all semantic actions in the resulting right-most SDT is called the suffix (postfix) translation plan.
3. Specific syntax tree (concrete Syntax tree, CST) and abstract syntax tree (abstracted Syntax tree, AST)
The specific syntax tree CST is actually a parsing tree (often referred to as the "parse Tree"), as in the 2nd section of Dragon Book, the expression "9-5+2" is given in the form "List-- list + digit "syntax analysis tree.
in a specific syntax tree, the internal nodes of the tree represent non-terminator, and the leaf nodes are all terminator, and these terminator constitute input strings that can be deduced from the corresponding production.
Abstract syntax tree AST is a data structure in which each internal node represents an operator in an AST of an expression, and the node's child node represents the operand of the operator. It is the abstract syntax tree for the expression "9-5+2" in the form of "list + digit", which is given in the 2nd. 5.1 section of Dragon Book.
In contrast to the AST and CST, the AST omits many of the auxiliary symbols appearing in the CST, which makes the AST very concise, and it is obviously more efficient to process the AST when the compiler implements the parsing syntax.
It is worth noting that
CST is just a conceptual syntax tree , which in principle guarantees the compiler's unambiguous parsing of the source file, but when the compiler implements parsing, CST appears too redundant, so
The simplified version of the CST AST is the data structure that most compilers actually build when parsing.
For more discussion of CST and AST, you can refer to StackOverflow's post, what is the difference between an Abstract Syntax tree and a concrete Syntax tree? or refer to this article abstract vs. concrete Syntax Trees.
"references"
1. Dragon book section 2.3-2.5 section
2. Stanford lecture Notes for Dragonbook: syntax-directed definitions 
3. Wikipedia: Types of Parsers 
4. Wikipedia: Attribute grammar ( Inherited-attributes && synthesized-attributes)
5. Wikipedia: syntax-directed translation
6. StackOverflow: What's the difference between an Abstract Syntax Tree and a concrete Syntax tree? 
7. Abstract vs. Concrete Syntax trees 
========================== EOF =========================
Introduction to the basic concepts involved in the "long Book Notes" Grammar Analysis