[Longshu Note] introduction to basic concepts involved in syntax analysis

Source: Internet
Author: User

[Longshu Note] introduction to basic concepts involved in syntax analysis

This document is my understanding of sections 2.3-2.5 of longshu. It mainly introduces many basic concepts of syntax analysis on the compiler front-end. The next note will be based on the main content of this note to implement a suffix-based syntax translator Demo for simple expressions (the original book is a java example, I will give the Implementation of The Python version with the same logic ).

1. Syntax Analysis)
To put it simply, the task of syntax analysis is to analyze the input symbol string (string of symbols, usually the tokens produced by lexical analysis) whether to follow the rules given by a language in context-free grammar. The rules are given by a series of producers.
If the analysis is successful, it constructs the Abstract syntax tree of the program; otherwise, it throws an error message similar to "syntax error.
The program that completes syntax analysis is called Parser in terms of terms.The task is to determine whether the input token streams can be derived from the start symbol of the language syntax and how to deduce it.
In principle, parser must be able to construct a syntax analysis tree for input strings (although in fact, the compiler does not actually construct a complete syntax analysis tree when implementing syntax analysis, it constructs a more concise abstract syntax tree ).
Paser has two common implementation methods:
1) top-down parsing

The top-down type parser constructs a syntax analysis tree from the root node and gradually proceeds to the leaf node direction. This method can easily create an efficient syntax analyzer manually.
Common terms related to such parser include inherited attributes and LL parser.
2) bottom-up parsing
The bottom-up method starts from the leaf node and gradually constructs the root node. This method can deal with more types of grammar and translation plans. Therefore, software tools that generate syntax analyzer directly from grammar usually use the bottom-up method.
Common terms related to such parser include synthesized attributes and LR parser.
For a summary of common parsers (such as LL-parser/LR-parser) in the industry, refer to the parser entries in wikipedia.

2. Syntax-Directed Translation, SDT)
Syntax-guided translation refers to a conversion process in which the input string is translated into a series of semantic actions, which are achieved by appending rules or program fragments to the syntax-generated form. SDT provides a simple method for appending a semantic action (semantic actions) to a syntax.
SDT has two important concepts:Syntax-Directed Definition (SDD) and Syntax-Directed Translation Scheme (SDTS ).

2.1 syntax guidance definition (SDD)
SDD is a special context-free grammar. It is special in that it is a syntax symbol (grammar symbol) attributes (such as the Data Type, attribute value, number of commands, and command location of the symbols) are appended, And the semantic rules (semantic rules) are attached to the generative form of these Grammar symbols ), the Semantic Rule defines the evaluate rule for the attribute values associated with the symbols in the generated formula. The following is an example of syntax-guided definition. It attaches the semantic rules on the right to the production formula on the left.

2.2 syntax-guided Translation Plan (SDTS)
SDTS is also a special context-free grammar. It embeds a program segment called semantic actions in the generative body. Syntax-guided translation plans are similar to syntax-guided definitions, but explicitly specify the computational sequence of Semantic Rules. The following is an example of a syntax-guided Translation Plan, which embeds semantic actions in the production formula.

2.3 differences and connections between SDD and SDTS
Lecture Notes of longshu in a CS course in Stanford describes the differences between the two concepts:
SDDs are useful for specifying translations.
SDTSs are useful for implementing translations.

That is, the syntax guidance definition (SSDs) specifies the semantic rules attached to the production type; while the syntax guidance Translation Plan (SDTSs) these semantic actions are implemented at an appropriate time (usually during the derivation of the input token stream based on the generative formula.
According to wikipedia's introduction to Syntax-directed translation, SDD is the easiest to implement when the bottom-up parsing method is used and SDD adopts the S-attributed Syntax. In this case, syntax-guided translation plans can be constructed by embedding semantic actions on the rightmost side of the generative body. These actions are executed when the generative form is deduced. This embedding all semantic actions in the rightmost generated SDT is called the postfix translation plan.

3. Concrete Syntax Tree (CST) and Abstract Syntax Tree (AST)
The specific syntax tree CST is actually the syntax analysis tree (commonly referred to as the parse tree ), for example, the syntax analysis tree of the expression "9-5 + 2" in section 2.2.3 of longshu when the formula is "list-> list + digit.

In a specific syntax tree, the internal nodes of the tree represent non-terminologies, and all the leaf nodes are terminologies. These terminologies constitute the input strings that can be derived from the corresponding formula.
The abstract syntax tree AST is a data structure. In the AST of an expression, each internal node represents an operator, and the child nodes of an internal node represent the operands of the operator. Is the abstract syntax tree for the expression "9-5 + 2" in Section 2.5.1 of longshu when the formula is "list-> list + digit.

Compared with AST and CST, AST skips many auxiliary symbols that appear in CST, which makes AST very concise. It is more efficient to process AST when the compiler implements syntax analysis.
It is worth noting that, CST is just a conceptual syntax tree.In principle, it ensures the non-Ambiguous syntax analysis of the source code file by the compiler. However, when the compiler implements syntax analysis, CST is too redundant, The simplified CST version AST is the data structure that most compilers really build during syntax analysis..
For more information about CST and AST, refer to the post in StackOverflow. What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree? Or refer to this article Abstract vs. Concrete Syntax Trees.

[References]
1. longshu section 2.3-2.5
2. Stanford Lecture Notes for DragonBook: Syntax-Directed Definitions
3. wikipedia: Types of parsers
4. wikipedia: Attribute grammar (Inherited-Attributes & Synthesized-Attributes)
5. wikipedia: Syntax-directed translation
6. StackOverflow: What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?
7. Abstract vs. Concrete Syntax Trees

===========================================Eof ================


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.