Index:
- Concept
- YACC File Format
- Definition
- Rule Section
- Part 3
- Recursive Processing
- If-else conflict
- Error Handling
- YACC source program Style
1. Concept
YACC uses the BNF syntax to process context-free syntax ).
The sign that appears on the left side of each formula (left-hand side: LHS) is a non-terminal sign and appears on the right side of the formula (right-hand side: RHs) the symbols are non-terminal symbols and terminal symbols, but the terminal symbols only appear on the right side.
Conflicts may occur during the Protocol process. YACC has some default processing methods for this, that is, using the first matching rule.
2. YACC File Format
The YACC file is divided into three parts:
... definitions ...%%... rules ...%%... subroutines ...
3. Definition
The first part includes the token definition and C code (included in "% {" and "% ).
For example, define a flag in the definition section:
%token INTEGER
After running YACC, a header file containing the predefined Meanings of the flag is generated, for example:
#ifndef YYSTYPE#define YYSTYPE int#endif#define INTEGER 258extern YYSTYPE yylval;
LEX uses the flag definition in this header file. YACC calls the yylex () of lex to obtain the token. The value corresponding to the token is put in the variable yylval. The yylval type is determined by yystype. The default type of yystype is int. For example:
[0-9]+ {yylval = atoi(yytext);return INTEGER;}
The token mark 0-258 is reserved as the character value. Generally, the token Mark starts from. For example:
[-+] return *yytext; /* return operator */
Returns the plus or minus sign. Place the minus sign in front to avoid being recognized as a range symbol.
For operators, % left and % right: % left can be defined to indicate left-associated, and % right to indicate right-associated ). You can define multiple groups of % left or % right. The group defined later has a higher priority. For example:
%left ‘+’ ‘-‘%left ‘*’ ‘/’
The multiplication and division by method and subtraction defined above have a higher priority.
YACC maintains two stacks: The symbol stack and the value stack, which are always synchronized.
Change the type of yystype. Define ttstype as follows:
%union { int iValue; /* integer value */ char sIndex; /* symbol table index */ nodeType *nPtr; /* node pointer */};
The content in the generated header file is:
typedef union { int iValue; /* integer value */ char sIndex; /* symbol table index */ nodeType *nPtr; /* node pointer */} YYSTYPE;extern YYSTYPE yylval;
You can bind a token to a domain of yystype. For example:
%token <iValue> INTEGER%type <nPtr> expr
Bind expr to nptr and integer to ivalue. YACC performs conversion during processing. For example:
expr: INTEGER { $ = con($1); }
The conversion result is:
yylval.nPtr = con(yyvsp[0].iValue);
Here, yyvsp [0] is the current header of the value stack.
A method that defines a higher priority for a Single-dollar minus sign:
%left GE LE EQ NE '>' '<'%left '+' '-'%left '*'%nonassoc UMINUS
% Nonassoc indicates that there is no combination. It is generally used in combination with % prec, indicating that the operation has the same priority. For example:
expr: '-' expr %prec UMINUS { $ = node(UMINUS, 1, $2); }
Indicates that the operation has the same priority as uminus. In the above definition, uminus has a higher priority than other operators, so the operation has a higher priority than other operators.
4. Rule Section
The rules are similar to the BNF syntax.
In the rule, the target or non-terminal operator is placed on the left, followed by a colon (:), followed by the right of the generative formula, followed by the corresponding action (included ). For example:
%token INTEGER%%program: program expr '/n' { printf("%d/n", $2); }|;expr: INTEGER { $ = $1; }| expr '+' expr { $ = $1 + $3; }| expr '-' expr { $ = $1 - $3; };%%int yyerror(char *s){ fprintf(stderr, "%s/n", s); return 0;}int main(void){ yyparse(); return 0;}
$1 indicates the value of the first tag on the right, $2 indicates the value of the second tag on the right, and so on. $ Indicates the value after the Statute.
5. Part 3
This part is the function part. When a YACC parsing error occurs, the yyerror () function is called. You can customize the implementation of the function. The main function calls the YACC resolution entry function yyparse ().
6. Recursive Processing
Recursive processing includes left recursion and right recursion.
Left recursive form:
list:item| list ',' item ;
Right recursion:
list: item| item ',' list
When right recursion is used, all items are pushed into the stack to start the Protocol. When left recursion is used, there will not be more than three items in the stack at the same time.
Therefore, left recursion has a great advantage.
7. If-else conflict
When there are two if and one else, the matching between the else and the IF is a problem. There are two matching methods: the first match and the second match. Modern Programming Languages allow else to match the nearest if, which is also the default behavior of YACC.
Although YACC is correct, to avoid warning, you can give the IF-else statement a higher priority than the if statement:
%nonassoc IFX%nonassoc ELSEstmt: IF expr stmt %prec IFX| IF expr stmt ELSE stmt
8. Error Handling
When an error occurs during YACC parsing, the default action is to call the yyerror () function and return a value from yylex. A more friendly method is to ignore an error input stream and continue scanning. The implementation method is as follows:
stmt:';'| expr ';'| PRINT expr ';'| VARIABLE '=' expr ';| WHILE '(' expr ')' stmt| IF '(' expr ')' stmt %prec IFX| IF '(' expr ')' stmt ELSE stmt| '{' stmt_list '}'| error ';'| error '}';
The error flag indicates that when YACC finds an error, it calls yyerror (), and then the input stream goes forward to ';' or '}', and then continues scanning.
9. YACC source program Style
It is recommended to write in the following style:
- All terminal characters use uppercase letters, while all non-terminal characters Use lowercase letters;
- Place the syntax rules and semantic actions in different rows;
- Write the same left rule together, and write the left rule only once, and all the subsequent rules are written after the vertical line "|;
- Put the Semicolon ";" at the end of the rule, exclusive line;
- Use tabs to align rules and actions.