Author: Hu Yan 2013-4-28
Code Download Address: http://pan.baidu.com/share/link?shareid=579088&uk=253544182
This framework is a LEX/YACC complete example, including a detailed annotation for learning the basic building method of the LEX/YACC program, which can be compiled and executed by typing make under Linux/cygwin. Most of the frameworks have been set up, and you can use a little extension to be a calculator or something, for the course design of the principles of compilation, or the code to understand other LEX/YACC projects.
This example is small but demonstrates the most important and commonly used features of the LEX/YACC program:
* LEX/YACC program composition structure, file format.
* How to use C + + and STL libraries in LEX/YACC, use extern "C" to declare those LEX/YACC-generated, C functions to link, such as Yylex (), Yywrap (), Yyerror ().
* Redefine yystype/yylval as complex type.
* The definition and use of multiple states in Lex, using the Begin macro to toggle between the initial state and other states.
* The definition and identification of regular expressions in Lex.
* Lex returns data with Yylval to YACC.
* Yacc in the%token<> way to declare YACC mark.
* YACC types are declared%type<> in the form of non-terminal.
* The correct reference method for token attributes ($, $, etc.), and non-terminal attributes ($$) in the YACC embedded C code action.
* Assign values to the Yyin/yyout to change the YACC default input/output target.
This example features a file.txt file in the current directory, parsing the identifier, number, and other symbols, and displaying it on the screen. The Linux debugging environment is Ubuntu 10.04.
file list:
LEX.L: Lex program file.
yacc.y: yacc program file.
main.h: header files used by LEX.L and YACC.Y.
Makefile: Makefile file.
lex.yy.c: C file generated after LEX.L is compiled with Lex.
YACC.TAB.C: C file generated after compiling yacc.y with YACC.
Yacc.tab.h: The C header file generated after compiling yacc.y with YACC, contains definitions of%token, Yystype, yylval, etc. for LEX.YY.C and YACC.TAB.C use.
file.txt: An example of a parsed text.
README.txt: this note.
The main code files are listed below:
header files used in conjunction with MAIN.H:LEX.L and Yacc.y
#ifndef main_hpp
#define MAIN_HPP
#include <iostream>//use C + + library
#include <string>
# Include <stdio.h>//printf and file using the
using namespace std;
/* When Lex identifies a token, it passes the data through the variable yylval to the YACC. By default, Yylval is an int type, which means only integer data can be passed.
yylval is defined with a Yystype macro, and you can reassign the Yylval type (see YACC automatically generated header files Yacc.tab.h) as long as you redefine the Yystype macro.
in our example, when the identifier is identified to pass the identifier string to the YACC, Yylval is not very convenient to define as an integral type (to be forced into an integral type, YACC to be converted back to char*).
here the Yystype is redefined as struct Type, which can hold a variety of information
/struct type//usually each member of this, will only use one at a time, is generally defined as union to save space ( However, this is not done using a complex type such as String
{
string m_sid;
int m_nint;
char m_cop;
};
#define Yystype type//The Yystype (that is, the yylval variable) to the struct type, so that Lex can return more data to YACC
#endif
Lex.l:lex Program Files
%{/* The generated file for Lex is the lex.yy.c Lex file, which is composed of 3 paragraphs, separating the 3 segments with 2% percent rows.
The 1th paragraph is the declaration segment, which includes: 1-c code sections: include header files, functions, types, and so on, which are copied to the generated. c file.
2-Status statement, such as%x COMMENT.
3-Regular definition, such as digit ([0-9]).
The 2nd paragraph is the rule segment, which is the body of the lex file, including how each rule (such as identifier) matches, and the C code action to be executed after the match. The 3rd paragraph is defined by the C function definition segment, such as Yywrap (), which is copied to the generated. c file as is. This paragraph can be null///1th: The declaration section #include "main.h"//lex and Yacc to share header files, which contains a number of header files, redefining the Yystype #include "yacc.tab.h"// C-header files generated after compiling yacc.y with YACC, containing%token, Yystype, yylval definitions (all C macros), for lex.yy.c and YACC.TAB.C use extern "C"//To be able to call C functions in C + + programs, Every C function that needs to be used must be included in the extern "C" {} block so that the C + + link can successfully link to them.
extern "C" is used to set the C link type in the C + + environment.
{There are similar extern "C" in//yacc.y, which can be combined into a section and placed in a common header file Main.h int yywrap (void); int Yylex (void);//This is the lexical analysis function generated by Lex, which is called in Yacc's Yyparse (), and if it is not declared here, the generated YACC.TAB.C can not find the function at compile time. Each of the regular expressions in the/*lex is preceded by a "< Status > ", for example," <comment>\n "below.
Each State must be declared with%x before it can be used. When Lex starts running, the default state is initial, which can later be used in C code with "BEGIN state name;"
Switch to a different state (begin with a LEX/YACC built-in macro).
At this point, only when the Lex state is switched to COMMENT will it match the regular formula at the beginning of <COMMENT>, not the other state.
That is, in what state Lex is currently in, consider the regular formula that begins with that state, ignoring the other regular formula. Its application for example, in a section of C code, is also a string of "abc" if it is written in the generationCode snippet, it is recognized as an identifier, and if it is written in a comment, it is not.
Therefore, the identification result of the string "ABC" should be differentiated according to different states. This example needs to ignore the end of the line annotation in the text, which is defined as the beginning of a "//" until the end of the line is commented.
The implementation method is: 1-lex startup is the default initial state, in this state, the string "ABC" will be recognized as an identifier, the string "123" will be recognized as integers. 2-Once the "//" is recognized, the Begin macro is used to switch to the comment State where strings such as ABC and other characters are ignored. If you recognize the newline character \ n, then switch to the initial state using the begin macro to continue to identify the other tokens. */%x COMMENT/* Non-numeric by the uppercase and lowercase letters, the underline composition * * * NONDIGIT ([_a-za-z])/* A number, can be 0 to 9*/digit ([0-9])/* Integer by 1-digit number of digits consisting of/integer ({digit} +)//* identifier, preceded by a non-numeric, followed by 0 to multiple numeric or Non-numeric/identifier ({nondigit} {nondigit}|{ digit}) * *)/* One or a continuous blank character/blank_chars ([\f\r\t\v]+)/* below% start 2nd: Rule Segment * *% {identifier} {//matching identifier string, at which time the string value is saved by Yytext Yyl
val.m_sid=yytext;//passes the value of the recognized token to the YACC by Yylval, and since yylval is defined as the struct Type, the yytext can be assigned to the M_SID member, and in YACC it can be quoted in $n way. return IDENTIFIER; Return to YACC: The recognized notation type is identifier} {integer} {//matched integer string yylval.m_nint=atoi (Yytext);//Converts the recognized integer string to an integer value, stored in the integer member of the Yylval, to the In the YACC, $n the return integer;//to the YACC: The identified token type is INTEGER} {blank_chars} {//When whitespace characters are not done, ignore them} \ n {//In case of newline characters , ignore} "//" {///encountered string "//", indicating that you want to start a comment until the end of the line COUT≪< "(comment)" <<endl;//hint encountered the comment begin comment;//switch to the annotation state with the begin macro to filter the comment and the next time Lex will only match a regular with <COMMENT> on the front }
. {//. Indicates a character other than \ n, note that this rule is to be put at the end, because once the match is made. The following rules will not be matched (except for other states <> beginning rules) yylval.m_cop=yytext[0];//because only one character is matched, At this time it corresponds to Yytext[0], the character stored in the Yylval M_COP members, to the Yacc $n way to refer to return operator;//back to YACC: identified the type of Mark is OPERATOR} <COMMENT> \ n {//note State of the rules, only the current switch to the COMMENT state will go to match the BEGIN initial;//in the annotation state, when the line break is encountered, indicating that the annotation ended, return the initial state} <comment>.
{//In annotation state, ignored for all other characters, that is, the annotation was filtered out in lex (lexical analysis layer) and not returned to YACC.%//3rd: C function definition segment int yywrap (void) {puts ("-----The" file is end); Return 1;//returns 1 to read all end. If you want to continue reading other files, you can fopen the file here, the file pointer is assigned to Yyin, and return 0}
YACC.Y:YACC Program Files
%{/* The generated files for the YACC are YACC.TAB.C and yacc.tab.h YACC files are composed of 3 paragraphs, separating the 3 sections with 2% percent rows.
The 1th paragraph is the declaration segment, which includes: 1-c code sections: include header files, functions, types, and so on, which are copied to the generated. c file.
2-notation declaration, such as%token 3-type declaration, such as%type 2nd is a rule segment, is the body of the Yacc file, including how each generation is matched and the C code action to be executed after the match. The 3rd paragraph is defined by the C function definition segment, such as Yyerror (), which is copied to the generated. c file as is. This paragraph can be null///1th: The declaration segment #include "Main.h"//lex and Yacc to share header files, which contains a number of header files, redefined yystype extern "C"//in order to be able to call the C function within C + + programs, Every C function that needs to be used must be included in the extern "C" {} block so that the C + + link can successfully link to them.
extern "C" is used to set the C link type in the C + + environment.
{There are similar extern "C" in//lex.l, which can be merged into a section and placed in a common header file main.h void yyerror (const char *s);
extern int Yylex (void);//This function is defined in LEX.YY.C, Yyparse () to call the function, in order to compile and link, must be declared with extern}%}/*lex the declaration of the sign to return
Use a pair of <member> to define a token, designed to simplify the way you write. Assuming that the 1th non-terminal in a production is a token OPERATOR, the way to reference the OPERATOR property is: 1-if the notation OPERATOR is defined in a normal manner, such as%token OPERATOR, write $1.m_cop in the action. to indicate which member of the Yystype is using 2-after the%token<m_cop>operator method is defined, just write the $1,YACC will automatically replace it with the $1.m_cop <> define the mark, not non-terminal such as file, Tokenlist must be defined with%type<member> (otherwise it will be an error) to indicate which of their properties corresponds to which member of the Yystype, when the non-terminal reference, such as $$, is automatically replaced with $$.member*/%token<m _niNt>integer%token<m_sid>identifier%token<m_cop>operator%type<m_sid>file%type<m_sId> Tokenlist percent of File://files, consisting of the token stream tokenlist///Only the ID in the token stream is shown here {cout<< "all ID:" <<$1<<endl; A property that is Non-terminal tokenlist, because the non-terminal is defined with%type<m_sid>, that is, the Yystype property that the Convention uses M_SID, which is equivalent to $1.m_sid, whose value has been assigned in the underlying generation (
Tokenlist IDENTIFIER)}; tokenlist://the token stream, either empty or composed of several numbers, identifiers, and other symbols {} | Tokenlist integer {cout<< "int:" <<$2<<endl;//$2 is an attribute of the notation integer, because the notation is defined by%token<m_nint>, That is, the M_nint attribute to which the yystype is used, the $ $ will be replaced with Yylval.m_nint, which has been assigned in Lex | Tokenlist IDENTIFIER {$$+= "" + $2;//$$ is non-terminal-tokenlist property, because the non-terminal is defined with%type<m_sid>, which is the Yystype attribute of the contract, M_SID is equivalent to $ $.m_sid, where the identified string of identifiers is stored in the Tokenlist attribute, it can be taken out of the upper generation to use cout<< "ID:" <<$2<<endl;//$2 is the attribute of the token identifier, Since the notation is defined with%token<m_sid>, that is, the Yystype M_sid attribute is agreed upon, the $ $ is replaced with YYLVAL.M_SID, which has been assigned in Lex | Tokenlist OPERATOR {cout<< "OP:" <<$2<<endl;//$2 is the attribute of the token OPERATOR because the token is%TOKEN<M_COP≫ defined, that is, the M_COP attribute to which the contract is used, the $ $ will be replaced with YYLVAL.M_COP, which is yystype in Lex; %% void Yyerror (const char *s)//When YACC encounters a syntax error, the Yyerror function is recalled and the error message is placed in parameter s {cerr<<s<<endl;//direct output error message} int m
Ain ()//program main function, this function can also be placed in other. C,. cpp file {const char* sfile= "file.txt";/Open the text file to read file* Fp=fopen (sfile, "R");
if (fp==null) {printf ("Cannot open%s\n", sfile);
return-1; extern file* Yyin; Yyin and Yyout are file* types YYIN=FP;//YACC read input from Yyin, and yyin default is standard input, which is changed to disk file.
YACC default to Yyout output, modifiable yyout change output purpose printf ("-----Begin parsing%s\n", sfile);
Yyparse ();//enables YACC to begin reading input and parsing, which invokes Lex's Yylex () Read token puts ("-----end parsing");
Fclose (FP);
return 0; }
makefile:makefile File
Lex=flex
yacc=bison
cc=g++
object=main #生成的目标文件
$ (OBJECT): lex.yy.o yacc.tab.o
$ (CC) LEX.YY.O Yacc.tab.o-o $ (object)
@./$ (object) #编译后立刻运行
lex.yy.o:lex.yy.c yacc.tab.h main.h
$ (cc)-C lex.yy.c
yacc.tab.o:yacc.tab.c main.h
$ (cc)-C yacc.tab.c
yacc.tab.c yacc.tab.h: Yacc.y
# Bison compiles. y files with the-D parameter
$ (YACC)
-D yacc.y lex.yy.c:lex.l
$ (Lex) lex.l clean
:
@rm- F $ (OBJECT) *.O
file.txt: Examples of parsed text
ABC Defghi
//this line are comment, ABC 123!@#$ 123 45678//comment line end
! @ # $
How to use:
1-Extract the Lex_yacc_example.rar to the Linux/cygwin.
2-the command line enters the Lex_yacc_example directory.
3-When make is typed, the following actions are automatically performed:
(1) automatically invoke flex compile. l file, generate Lex.yy.c file.
(2) automatically invoke Bison compile. y files, generate YACC.TAB.C and yacc.tab.h files.
(3) automatically invoke g++ compile, link out executable file main.
(4) automatically executes main.
the results of the run are as follows:
bison-d yacc.y
g++-C lex.yy.c g++-c yacc.tab.c g++ lex.yy.o yacc.tab.o-o
main
-----begin parsing file. TXT
id:abc
Id:defghi
(comment)
int:123
int:45678
(comment)
op:!
OP: @
op: # op: $-----The ' is ' End all
id:abc defghi
-----End Parsing
References: Lex and Yacc from entry to Mastery (6)-parsing c-c++ include files http://blog.csdn.net/pandaxcl/article/details/1321552
Other articles and codes please pay attention to my blog:http://blog.csdn.net/huyansoft
[end]