LEX and YACC use (iii)

Source: Internet
Author: User
Tags bitwise data structures error handling lowercase scalar
In the 2.3.8 has been involved in the two semantic and conflict issues, here to focus on the introduction, which in writing YACC source program will often encounter. Ambiguity can bring about conflict. In 2.3.8 we introduced the YACC can be used for operators to determine the precedence and binding rules to resolve the conflict caused by two semantics, but there are some two semantics caused by the conflict is not easy to solve by the priority method,

As the famous example:

Stat:if Bexp then stat

| IF Bexp then stat ELSE
Stat
;

For the conflicts caused by such ambiguity and some conflicts not caused by the righteousness of two, YACC provides the following two rules for the elimination of ambiguity:

A1. In the event of a migration/return conflict, move in;

A2. In the event of a return/return conflict, the first occurrence is used in the order in which the resulting formula appears in the YACC source program.

We can see that using these two rules to resolve the ambiguity of the IF statement above is in line with our needs. So the user does not have to transform the above grammar into ambiguity. When the YACC uses the above two rules to eliminate the two semantics, it will give the corresponding information.

Here's a little more rigorous introduction to how YACC uses precedence and bonding to resolve conflicts.

YACC in the source program also has a priority and binding. This priority and binding is the precedence and binding of the last terminator or literal character in the right part of the production, and when the%PREC clause is used, the precedence and binding of the production is determined by the%PREC clause. Of course, if there is no priority or binding for the last terminator or literal character in the right part of the resulting formula, there is no precedence or binding.

According to Terminator (or literal characters) and the precedence and binding of the production, there are two conflicting rules for YACC:

P1. When there is a migration/return conflict or a return-to-Covenant conflict, and there is no precedence or binding between the input symbol and the grammar rule (the production), the conflicts are resolved with AI and A2.

P2. When there is a migration/return conflict, if the input symbol and grammar rules (the production type) have precedence and binding, then if the input symbol priority is greater than the priority of the production is moved in if the input symbol priority is less than the priority of the resulting formula. If the two priorities are equal, then the action is determined by the binding, the left combination is normalized, the right combination is moved, and the non-binding error occurs.

Conflicts resolved with priority and binding performance, YACC is not reported to the user.

Error handling in 2.4.4 parsing

When parsing an input string with syntax errors, it is best to continue the parsing after reporting the error message in order to find more errors.

YACC handles errors by: When a syntax error is found, YACC discards the symbols that cause the error to properly adjust the state stack. The analysis is then resumed from the last sign of the error or by skipping several symbols until a user-specified symbol is encountered.

Yacc inside there is a reserved terminator error, write it on the right side of a production, then YACC think this place may be wrong, when the parsing is really wrong here, YACC with the method described above, if there is no use of error generation, then YACC print out " Syntax error "to terminate the parsing.

Let's look at two simple examples that use error:

1. The following production type

Stat:error
;

To enable YACC to analyze the sentence patterns derived by stat, and to skip the wrong part when encountering grammatical errors,

Print grammatical error message)

2. The following production type

Stat:error '; '
;

Causes YACC to encounter grammatical mistakes, skipping the input string until it touches the next semicolon to continue parsing.

If the input string for parsing is entered from the keyboard (that is, interactive), then a row is wrong and you want to re-

Enter this line and let YACC immediately begin to continue analysis, as long as you use the statement yyerror in the semantic action, as in the following example:

Input:error ' n '
{yyerror;
printf ("Reenter last line:");}
Input
{$$=$4;}
;


See examples of [2] and 6 for error handling.

2.5 Part of the program section

Part of the program section mainly includes the following: Main program main (), error message execution program Yyerror (s), lexical analysis program Yylex (), the user in the semantic action of the subroutine used, the following are described separately.

2.5.L Main Program

The main function of the main program is to call the parser Yyparse (), Yyparse () is YACC from the user write YACC source program automatically generated, before or after the call of the parser Yyparse () users often need to do some other processing, which is also done in main (), If the user only needs to call Yyparse () in main (), you can also use the main () provided in the UNIX YACC library (one ly) instead of writing it yourself. The main () in the library is as follows:


Main () {
Return (Yyparse ());


2.5.2 Error Message Reporting program

The YACC library also provides an error message reporting program with the following source program:


# include <stodio. h>
Yyerror (s) char * s{
fprintf (stderr, "%s n", s);


If the user finds this yyerror (s) too simple. You can also provide one yourself, such as remembering the line number of the input string and when Yyerror (s) is called, you can report an error line number.

2.5.3 Lexical Analysis Program

The lexical analysis program must be provided by the user, whose name must be Yylex, and the parser provides the syntax parser with the currently entered word symbol. Yylex provided to Yyparse is not the Terminator itself, but the Terminator number, that is, token, if the current end have semantic value, Yylex must assign it to yylval.

The following is part of an example of a lexical analysis program.


Yylex () {
extern int Yylval
int C;
...
C= GetChar ();
...
Switch (c) {
...
Case ' 0 ':
Case ' 1 ':
...
Case "9"
yylval=c-' 0 '
return (DIGIT);
...
}
...


When the lexical analysis program encounters a number, it assigns its corresponding value to Yylval and returns the terminator number of the digit, noting that the digit represents its number (as can be defined by a macro).

Users can also use Lex as a tool to write a token lexical parser, if so, in the YACC source program section of the program only need to use the following statement to replace the lexical analysis program:

#include "lex.yy.c"

In order to understand Lex's relationship with YACC, we use the following figure to show how Lex works with YACC.

In UNIX systems, assuming Lex source program called Plo.l.yace source program called PLO.Y, then from these source programs get the available Word parser in the parser sequence using the following three commands:

Lex PLO.L

YACC Plo,y

CC Y,tab,c-ly-ll


The first command generates a lexical parser from the lex source program PLO.L, and the file name is lex.yy.c The second command generates a parser from the YACC source program Plo.y, and the file name is Y.TAB.C; The third command compiles the program that Y.TAB.C the C language to get a running target program.

The third command-11 is called the Lex Library,-ly is called the YACC library, if the user in the program section of the YACC source program to provide their own main () and Yyerror (s) of the two programs, you do not have to use-ly. In addition, if you use the selection-V in the second command, for example:

Yacc-v PLO.Y

YACC, in addition to generating Y.TAB.C, also produces a file called Y.output, whose content is the LR State transition table of the processed language, which is useful for checking the parser's work process. ”

See [4] for Lex (1) and YACC (1)

2.5.4 Other program Segments

The Semantic Actions section may need to use some subroutines that must obey the syntax of the C language, which is not much to say here.

2.6 YACC Source Program Example Description

Example 1. Describe an interactive calculator with YACC, which has 26 registers, denoted by the lowercase letter A to Z, which can accept an expression consisting of the operator + 、-、 *,/,% (modulo), & (bitwise), | (bitwise OR), which can be assigned to a register, If the calculator accepts an assignment statement, it does not print the result, the result is given in other cases, the operand is an integer, and if it starts with 0 (0), it is processed as an octal number.

Example 1 of the YACC source program is shown in Appendix F

Readers from Example 1 can be seen with the priority relationship and ambiguity grammar can make the source program concise, but also see error handling methods, but example 1 is the lack of its lexical analysis program is too simple, there are eight decimal numbers and the distinction between the best in the lexical analysis of processing.

Example 2. This example is an improvement of example 1, the reader can see the definition of Semantic value union type and how to use the method and how to simulate the syntax error and processing, the tree is also described as an interactive calculator, the ratio of 1 of the calculator is strong, it can handle floating point and floating point number of the interval operation, it accepts floating point constant, and + 、-、 unary-and = (Assignment) An expression consisting of 26 floating-point variables, denoted by lowercase A to Z, with floating-point numbers represented by a pair of floating-point numbers: (x, Y)

where x is less than or equal to Y, the calculator has 26 floating-point interval variables, denoted by A to Z in uppercase letters. Similar to Example 1, an assignment statement does not print the result, and the other expression prints the result, giving the appropriate information when an error occurs.

Here is a brief summary of some of the characteristics of an example 2.

1. Definition of a Semantic value union type

The interval is represented by a structure whose members give the left and right boundary points of the interval, which is defined by the TypeDef statement of the C language and assigned to the type name interval. After the semantic value of YACC is%union defined, the values of integer, floating-point and interval variables can be stored, and some functions (such as HIL,VMUL,VDIV) return the values of the struct type.

2. YACC's error handling

Yyerror is used in the source program to deal with an error in the sequence of 0 or divisor endpoints in the divisor interval, when the above error is encountered, Yyerror causes YACC to call its error handler, discard the wrong input line, and continue processing.

3. Use conflicting grammars

If the reader tries this example on the machine, it will find that it contains 18 migration/return conflicts and 26 attribution/return collisions, see the following two input lines:

2.5+ (3.5-4.0)

2.5+ (3.5,4.0)

In the second row, 2.5 is used in the interval expression, so it should be treated as an interval, that is, to convert its type from a scalar to an interval amount, but yacc only know if the type conversion should be done when reading the back ', ' it is too late to change the idea. Of course, you can also look at the 2.5 to look forward to a few symbols to determine the type of 2.5, but this implementation is more difficult, because the YACC itself is not supported, the example is by adding grammatical rules and make full use of YACC internal ambiguity elimination mechanism to solve the problem. In the above grammar, each interval of two-dollar operation corresponds to two rules, one of the left operand is the interval, the other left operand is a scalar, YACC can be automatically based on the context of the conversion of the type. In addition to this scenario, there are other scenarios where a type conversion is required, and this example places the syntax rules of a scalar expression before the interval expression syntax rule, making the type of the operand first a scalar. Until it is necessary to convert into intervals, which leads to those conflicts. Interested readers may wish to take a closer look at the source program and the Y.output file generated when YACC handles it. Analyze YACC specific ways to resolve conflicts. Note that the above method of solving type problem is very skillful, and it is difficult for more complicated problems.

(1) Interpretation procedures
Any high-level language, such as C, precisely defines the order in which data structures and programs are executed, that is, a computer (called a) is defined. A's storage structure is the data structure of C, A's controller controls the execution of C programs, A's operators to complete the C statement operation, a machine language is C, each C program has defined the computer A from the initial state to the termination state of the conversion rules. We use a general-purpose computer B to simulate the execution of machine A, in order to achieve this, a set of programs must be constructed using computer B's machine language to support the execution of programs written in machine language C of a. In other words, we built a high-level language computer A with software on a general-computer B. This process is called software emulation (or software interpretation).

(2) The main difference between the way of compiling and the way of interpretation
The interpreter consists of the general Control program and the explanatory functions of various directives or statements, which directly explain the internal form of the executing source program or source program according to the logic flow of the source program. For a statement in a loop, a fairly complex parsing process is performed each time the loop executes, so the interpretation is inefficient, and the compiler (see 2.1.4) parses the target code only once, and finally executes the target code, which is the main difference between them.
(1) Rationale for interpreting procedures
The interpretation algorithm of the simulation program, as shown in Figure 2.6, is interpreted according to the order in which the input program is executed, resulting in a program-defined loss.

2.6 Schematic diagram of the structure and workflow of the schematic interpreter

(2) The usual way to implement program language
Pure broken explanations and pure shredding are two extreme cases and rarely used. For example, only pure shredding is used when working with assembly language, and pure shredding is used when dealing with the control and interactive languages of the operating system. The common implementation of programming languages, as shown in Figure 2.7, is a combination of compilation and interpretation techniques.

Figure 2.7 General language implementation diagram

Which method of processing is determined by the implementation of the language and implementation of the environment (on what computer system), and the implementation of the language, its data structure and control structure more important role. Program language is divided into compiled and interpreted type, in order to improve efficiency and early detection of errors in the source program, as far as possible to use the compilation method, not only the use of interpretation, but also as far as possible to generate the target Code intermediate code.
C, C + +, FORTRAN, Pascal, and ADA are usually implemented by compiling, called compiled languages. The compiler translates the source program into the target computer language program, and the interpreter only provides a runtime library to support operations not provided by the computer language in the target language program. In general, the compiler is relatively complex, and its focus is to produce the target language program that runs as efficiently as possible.
LISP, ML, Prolog, and Smalltalk are often interpreted in a way known as interpreted languages. In the implementation, the compiler produces only the intermediate code that is easy to interpret, the intermediate code cannot be executed directly on the hardware machine, and the interpreter interprets the execution. This implementation of the compiler is relatively simple, the implementation of the complex task is to construct the interpreter.
Java is not like Lisp, but more like C + +, but because Java runs in a networked environment, Java is often interpreted as a language, and the compiler produces a byte-coded intermediate language that is interpreted by the browser-created interpreter on each terminal.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.