Document directory
- (1) YACC Overview
- (2) YSP file structure
- (3) YSP description
- (4) Composition of the YSP file program
- (5) YACC Installation
Back to document Homepage
In this article, YACC is excerpted from: compilation principles (the third edition). Jiang Liyuan, edited by Kang muning, published by Northwestern University of Technology Press.
(1) YACC Overview
The full name of YACC is yet another compiler-compiler, which is an automatic generation tool of lalr (1) analyzer. Its first version was published in early 1970s and is a product of Bell Labs in the United States, the author is S. c. johnson
It is very convenient to use YACC to construct a syntax analysis program. It requires you to write a "grammar Processing Instruction file" (YSP) file according to certain rules. The file extension is ". Y ". When the YSP file is input, YACC automatically constructs a syntax analyzer in the C language format. The analyzer mainly includes the standard Master Control Program provided by YACC and an lalr (1) analysis table.
(2) YSP file structure
A complete YSP file consists of instructions, rules, and programs. The three parts are separated by semicolons:
[Description] % rule part [% program part]
The description section and the Procedure Section enclosed in square brackets can be blank, but the rule Section is required. Therefore, the simplest form of the YSP file is
% Rule Section
(3) YSP description
The description section of the YSP file is used to define the variables and syntax symbols used in the Rule section. It can contain the following types of information:
1. variable definition
Variable definitions must be enclosed by a pair of special brackets "% {" and "%, the content includes the reference and description of the semantic actions in the Rule section and the relevant files (such as the header files related to the C language) required by the program section, and the definition of the data structure, the definition of global and external variables and the definition of function prototype should comply with the C language requirements.
2. Start symbol Definition
The START symbol of the grammar is specified by the description symbol % start (all the description symbols in YACC are derived from "%"). For example:
% Start startsymbol
The specified startsymbol is the start symbol of the syntax. If the definition of the Start symbol is not given, the system automatically uses the left symbol of the first syntax rule as the start symbol.
3. Vocabulary Definition
In this section, the user can give the terminator table, Union (uniion) and type description. YACC requires that all terminologies used should be explicitly stated; if not stated, all are processed by non-Terminator. The Terminator description is derived from the descriptive symbol % token or % term. There are two writing formats. The first writing format is
% Token tname [tname2...]
Terminologies are separated by spaces. When a row cannot be written, you can % token it to another row. The second writing format allows you to customize the internal encoding value of the Terminator. The format is
% Token tname <integer>
<Integer> must be an integer greater than 256. If you do not specify the internal encoding of the Terminator, the system starts from 257, define the internal code value for it in sequence: 257,257 ,.... YACC internal convention, when the lexical analysis program identifies an Terminator from the input string, it returns the internal encoding value of the final symbol. It should also be pointed out that in addition to the terminator Number of the primary font, single character operators, delimiters, and other terminologies in programming languages can be directly used in grammar with single quotation marks, you do not need to use % token for definition. Its internal code value is its ASCII code (its value will not be greater than 256, this is also the original difficulty where the internal code value of the User-Defined Terminator cannot be less than 257.
In the definition of a vocabulary, you can also use the Union (% Union) and type description (% Type) to define the data types that the Semantic Attributes of each grammar symbol should have. For example, assume that non-terminator A has the attr_a attribute and its value is an integer. Non-terminator B has two attributes, B1 and B2, where B1 is an integer and B2 is a floating point number; you can tell YACC that there are two types of grammar symbols:
%union{ int attr_a; struct { int b1; float b2; }attr_b;}
Then, use % type to describe the types of non-terminator symbols A and B:
%type <attr_a> A%type <attr_b> B
The name in the angle brackets <> is a member of the joint definition. Generally, if the joint definition is used, the type description (% Type) must be used) define the Data Type of each grammar symbol (each grammar symbol has only one symbol type). Otherwise, YACC reports an error. In the preceding method, multiple syntax symbols can be defined in one line, which can be separated by spaces. If one line cannot be written, you do not need to start a line and use the % TYPE command to continue the definition. If the description is not used together, YACC automatically uses the default type. YACC stipulates that the default type of each grammar symbol is an integer.
(4) Composition of the YSP file program
The sum part of the YSP file is optional. It consists of routine C Language Programs (functions), including the main program main (), lexical analysis program yylex (), the error processing subroutine yyerror (), the user-defined functions called in some semantic actions of the syntax rules, and other auxiliary functions. The following describes their formats, functions, and usage.
(1) After processing the YSP file, YACC enters a C program file named Y. Tab. C, which contains a syntax analysis program named yyparser. The main function of Main Program Main () is to call the yyparser () function to analyze the syntax of the source program. When the syntax analysis ends successfully, the value returned by yyparser () is 0. If the source program has a syntax error, in addition to the return value of 1, it also calls yyerror () function output error information. The main () and yerror () functions can be provided by the YACC Library (you only need to add the select item-ly to the compilation command ). If you believe that the functions of the above functions do not meet the requirements, for example, You Need To perform other auxiliary processing in the main program, or you need yyerror to output more detailed error information, you can also compile this function on your own.
(2) The YACC system stipulates that when the syntax analysis program yyparser () runs, a lexical analysis program named yylex () is required to support it. Every time yyparse () calls yylex () yylex () identifies a word from the input audio stream, and identifies the internal code of the word and Its Semantic Value (if any) the Return Statement and the global variable yylval are sent back to yyparser () respectively. We can see that the input of yyparser () comes from the return information of yylex.
(5) YACC Installation
On Linux, the gun version of YACC is Bison.
You can use the following command to install Ubuntu:
sudo apt-get install bison
Back to document Homepage