[Compilation Principle] Chapter 2 a simple syntax-guided Translator

Source: Internet
Author: User

1. syntax definition

1) grammar: defines and describes the language structure. That is, it is called "Grammar" (or "Syntax") that is used to describe and specify the language structure in the form, but not
Semantic issues.

For example, I am a college student ". This is a sentence with correct syntax and semantics. The structure of the sentence (called the syntax structure) is determined by its syntax. In this example, it is a "subject-predicate structure"

2) grammar Definition

Grammar G = (VN, VT, P, Z)
Vn: Non-terminator set, syntax variable
Vt: Terminator set, lexical Unit
P: a set of generative or rule
Z: Start symbol (recognition symbol) Z, vn

Example: If (expression) Statement else statement;

Keyword if and parentheses: terminator (lexical Unit)

Expression, statement: Non-Terminator

3) Derivation

It is derived from a symbol to be recognized, that is, the right part of the rule is used to replace the left part of the rule, and only one rule is used for derivation at a time. Starting from the start symbol, a non-Terminator is constantly replaced with a generative body of the non-Terminator.

<Sentence>: = <subject> <predicate>
<Subject>: = <pronoun >|< noun>
<Pronoun>: = You | Me | he
<Term>: = Wang Min | college student | worker | English
<Predicate >::= <verb> <direct object>
<Verb>: = Yes | Learning
<Direct object >::= <pronoun >|< noun>

4) syntax analysis

Accepts an ending symbol string as the input, and finds out the method for deriving the string from the grammar start symbol. If it cannot be pushed from the grammar symbol to get the ending symbol string, an error is returned.

5) syntax analysis tree

The syntax analysis tree is defined as a tree of the following nature:
1) the root is marked by the START symbol;
2) each leaf is marked by a Terminator, non-Terminator, or ε;
3) Each internal node is marked by a non-Terminator;
4) if A is an internal node marker and x1, x2 ,..., XN is the marker of all children on the node from left to right, then a → x1x2... XN is a production formula. If a → ε, then the node marked as a can have only one child marked as ε.

Example: 9-5 + 2

Syntax: List-> List + digit;

List-> list-digit;

List-> Digit

Digit-> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Non-Terminator: List digit list is the grammar start symbol.

Terminator: a sequence composed of zero or multiple terminologies. A string composed of zero terminologies is called an empty string.

Syntax analysis tree:

List

/| \

List | digit

/| \ |

List | digit |

|

9-5 + 2

6) Ambiguity

A grammar may have multiple syntax analysis trees that generate the same given Terminator.

Example: possible analysis tree of sentence ID + id * ID


(ID + id) * ID + id * ID

Eliminate ambiguity:

① Rewrite the binary text method as a non-binary text method;
② Specify the priority and combination of symbols in the binary text, so that only one analytical tree is generated.

Advantages of the binary method:
① It is easier to understand than non-binary methods;
② High analysis efficiency (low analysis tree, less direct derivation steps ).

Iii. Syntax-guided Translation

1) attributes: any amount related to a program structure. attributes can be varied, for example, the expression data type, the number of commands in the generated code, or the position of the first command in a generated code.

2) Translation scheme: attaches program fragments to the representation of each production formula of a grammar. When a production formula is used in the syntax analysis process, the corresponding program fragment is executed.

3) syntax-guided definition: 1. Each grammar symbol is associated with an attribute set, and 2. Each generative expression is associated with a set of Semantic Rules, these rules are used to calculate the attribute values associated with the symbols in the formula.


Iv. syntax analysis

1) syntax analysis: determines how to use a syntax to generate an end symbol string. In principle, the syntax analyzer must be able to construct a syntax analysis tree; otherwise, the correctness of the translation cannot be guaranteed.

2) syntax analysis includes top-down analysis and bottom-up analysis.

3) top-down analysis method: the construction method starts from the root node and gradually proceeds to the leaf node direction.

4) Prediction Analysis (recursive descent analysis): a top-down syntax analysis method that uses a set of recursive processes to process input.


5. Simple expression Translator

1) Abstract syntax tree: Each internal node represents an operator (unlike the non-terminator in the syntax analysis tree)

2) Translate the infix expression into a suffix expression:

Package demo_parser; import Java. io. *; public class demo_parser {static int lookahead; // The byte stream is expressed as an integer (corresponding to the decimal number in the ASCII Code) to public demo_parser () throws ioexception {lookahead = system. in. read (); // The read method reads the input data of the command line in byte stream mode} void term () throws ioexception // outputs the data if it is a number (letters are not recognized) {If (character. isdigit (char) lookahead) {system. out. write (char) lookahead); match (lookahead);} else throw new error ("syntax error");} void match (int t) throws ioexception {If (lookahead = T) lookahead = system. in. read (); else throw new error ("syntax error");} void expr () throws ioexception {term (); While (true) {If (lookahead = '+') {match ('+'); term (); system. out. write ('+');} else if (lookahead = '-') {match ('-'); term (); system. out. write ('-');} else return;} public static void main (string [] ARGs) throws ioexception {demo_parser parser = new demo_parser (); parser. expr (); system. out. write ('\ n ');}}


Vi. Lexical Analysis

1) read characters from the input and make them a "lexical unit object". The input character sequence that forms a lexical unit becomes a word element.

2) Remove blank spaces and comments: it is far from easy to implement

3) pre-read: for example, reading then requires further reading. If it is a space or other non-identifiers, it is determined as a keyword. Otherwise, it is the identifier (thenother)

<=>== <>

4) Recognition keywords and identifiers: lexical analysis uses a table to save strings


VII. symbol table

1) symbol table: a data structure used by the compiler to save various information about the source program structure. This information is gradually sent to the mobile phone and placed into the symbol table during compiler analysis.

2) symbol table entries: During the analysis phase, the lexical analyzer, syntax analyzer, and semantic analyzer are created and used. Create a syntax analyzer.

3) A symbol table is set for each scope to transfer information from the declared place to the actually used place.


8. Generate intermediate code

1) Two intermediate representations: tree structure, linear representation (especially "Three-Address Code ")











Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.