Brief introduction
This chapter is still focused on using YACC to implement the calculator, the main feature is to add variable support for arithmetic operations.
Module splitting
It consists of 3 modules
1. Lex Lexical analyzer
2. YACC Syntax Analyzer
3. Symbol table
Function Description 1.
Lex Lexical Analyzer
The formal definition is as follows:
delim [ \t] ws {delim}+ letter [a-zA-Z] digit [0-9] id {letter}({letter}|{digit})* 12.34 */ {digit}+(\.{digit}+)?
It is equivalent to having an alias for some regular expressions that are used frequently, and then using aliases to construct more complex regular expressions. For example, the ID identifier is formed by a combination of letter and digit, which means that the letters must start with the alphabet and can be followed by any combination of letters and numbers.
Action definitions for the lexical Analyzer (translation rules)
{WS} {/* Do nothing * /}"int"{Print_token (INT, Yytext);returnINT;}"Double"{Print_token (DOUBLE, yytext);}"Char"{Print_token (CHAR, yytext);}"+"{Print_token (PLUS, Yytext);returnPLUS;}"-"{Print_token (minus, yytext);returnMinus;}"*"{Print_token (times, Yytext);returnTimes;}"/"{Print_token (over, yytext);returnOver;}"("{returnLP;}")"{returnRP;}"\ n"{returnEOL;}"="{returnASSIGN;} {ID} {intp = sym_table. Lookup(Yytext);if(p = =-1){//not Findp = sym_table. Insert(Yytext);//insert The default value 0.0} yylval = P;//return the position returnID; } {Number} {Yylval = Atof (Yytext);returnnumber;}//yylval holds the value of a number, which is the token token "//".* {returnCOMMENT;}"."{printf ("Mystery character%s\n", Yytext); }
Each of these groups represents a conversion rule, and for a rule, its left side represents the pattern string to match, and the right side represents the lexical action to be performed. For example, the rule "+" {Print_token (plus, yytext); return plus;}, which indicates that after "+" is parsed, it executes Print_token and returns the plus tag (plus is defined in YACC).
operator Recognition:
As can be seen from the rules defined above, this lexical analysis supports the basic arithmetic operations of +,-, *,/, (,).
Identification of Number:
{Number} {yylval = Atof (yytext); return number;} Yylval holds the value of a number, which is the token token
No longer like (small White said that the compilation principle-3), the recognition of number in the YACC, using the method of CIN to get the numbers, but using the formal expression of lexical analysis, the results are assigned to yylval** (Yylval = atof (yytext)) * *, The number tag is then returned.
Identification of identifiers:
The execution of this rule is more complex, it takes advantage of a symbol table module (described later), when the identifier is identified, it looks in the symbol table (not found, insert a bar) and return its location information.
{id} { int p = sym_table.lookup(yytext); if(p == -1){//not find p = sym_table.insert(yytext);//insert the default value 0.0 } yylval = p;//return the position return ID; }
Yytext is a string of identifiers, p holds its position in the symbol table, assigns the location information to Yylval when found, and returns the identifier type ID. Both the Yylval and ID information are used in the YACC implementation.
2.
YACC Parsing parser
Tokens are defined as follows:
%token NUMBER ID %token PLUS MINUS TIMES OVER %token LP RP EOL COMMENT %TOKEN INT DOUBLE CHAR %token ASSIGN %left PLUS MINUS %left TIMES OVER %right UMINUS
Tokens defined above are used in the Lex lexical analyzer.
The YACC syntax translation rules are defined as follows:
Lines:lines expr EOL {printf ("%g\n", $); } | Lines EOL | Lines COMMENT | ; expr:expr PLUS Expr {$$= $+ $; }|expr minus expr{$$= $- $; }|expr Times expr{$$= $* $; }|expr over expr{$$= $/ $; }| LP Expr rp{$$= $; }|'-'Expr%precuminus{$$= - $; }| Number {$$= $;}//$$=$1 can ignored| ID {$$= Sym_table.getvalue ( $);}//get value from Sym_table| ID ASSIGN Expr {sym_table.setvalue ( $, $);$$= $; }//modify The value
Similar to the lexical analyzer is whether the rule matches, and then performs the appropriate syntax action, see Bai, "Compiler principle -3 (http://blog.csdn.net/lpstudy/article/details/51225953) Description of the YACC in the.
number of actions:
Number {$$=$1;}//$$=$1 can be ignored
After it recognizes the number, it assigns the value of this number (the lexical parser gets the Yylval) to $$, $ $, which represents the current result.
Note $1 means that the first token in the corresponding grammar rule is value, and similarly, for $2,$3 is the same, representing the value of the 2nd, 3rd token. For this rule,$1 is the value of number. This value is the Yylval (Yylval=atof (Yytext)) that is assigned in the lexical analysis.
the action of the designator:
The action of the indicator is divided into two categories, one is the assignment action, and the other is the value action. For example a=2 , which indicates to the variable a Assign value to 2 And then < Span class= "Mrow" id= "mathjax-span-127" > a + 4 Indicates that the a variable is added to 4, so the result is 6. For simplicity, this program does not take into account the definition of variables (for example, int a in c), and all variables with default values 0.0 , which can be used directly, to modify its value using an assignment operation.
Assignment action
ID ASSIGN Expr {sym_table.setvalue ($1, $3); $$\=\$3; }//modify the value
The above rule indicates that if an input such as a=2 is encountered, the SetValue method of the symbol table is executed, \$1 indicates the Yylval value (the position of the identifier) that is set when the lexical analyzer returns the ID, \$3 represents the result of expr in the parsing, so SetValue will \$ The identifier on the 1 position is set to \$3.
Take value action
ID {\$\$ = Sym_table.getvalue (\$1);} Get value from sym_table
The above rules explain: first the lexical analyzer returns the ID identifier, while \$1 the location of the store identifier, according to the location of the corresponding value and assigned to the \$\$ current value.
Lex and YACC Synergy Strategy
Lex passes to YACC two important information, type and value, type is implemented by return, and values are stored by yylval.
The token token that is encountered by the grammar rules in YACC is obtained by the return of Lex, and the value that is fetched by $number is actually taken out of the yylval value in Lex.
$1 and $3 This is determined by the location of the tags in the preceding rule to determine 1 and 3, their values either given by the lexical parser through Yylval, or by an assignment of $ $. For example, the value of expr in the ID ASSIGN expr is $3, which is not given by the lexical parser, but rather the $ $ value that was obtained when expr was parsed.
3. Symbol table
The symbol table is a data preservation area supporting lexical and syntactic analysis. The ID identifier is encountered during lexical parsing, it needs to be inserted into the symbol table, the default value is set to 0.0, a value ID identifier is encountered during parsing, the value of the symbol is obtained using the accessor provided by the symbol table, and the value of the corresponding symbol in the symbol table is updated when an assignment ID is encountered. Sample code, using an array for a simple implementation.
#include <iostream> #include <map> #include <vector> #include "yacc.h" #include "lex.h" using namespace STD;structNode {stringNameDoubleValue };classsymtable { Public: Symtable () {} Public:intLookupConst string& name) { for(inti =0; I < idvaluetable.size (); ++i) {if(Idvaluetable[i].name.compare (name) = =0){returnI } }return-1;//not Find}intInsertConst string& name) {//when Parser x=2 (current we get X)Node node; Node.name = name; Node.value =0.0; Idvaluetable.push_back (node);returnIdvaluetable.size ()-1; }voidSetValue (intPosDoubleValue) {//when Parser x=2 (current we get X)Idvaluetable[pos].value = value; }DoubleGetValue (intPOS) {returnIdvaluetable[pos].value; }Private: vector<Node>idvaluetable; };externSymtable sym_table;
Run effect
I lpstudy, reproduced please indicate the source: http://blog.csdn.net/lpstudy/article/details/51328851
Small White said the compiler principle of variable support calculator