YACC/lex Linux User Guide

Source: Internet
Author: User
Tags lexer

For how to use Parser Generator, go to/html/linuxshijie/20070917/84 .html.

The following articles are copyrighted by forest 100. Do not reprint them.

Lex Introduction

Lex generates a C language source code file based on the format defined by the. Lex or. l file. By compiling this source code, the compiler defined by the. Lex file or. l file is generated .. The format of the lex or. l file is divided into three parts:
1. Global variable Declaration
2. Lexical rules
3. Function Definition
The following is a simple example: lex_example.l File

% {// Global declaration part
/* Forest 100 Linux
Www.linmu100.com
*/
# Include <stdio. h>

Extern char * yytext;
Extern file * yyin;
Int sem_count = 0;

%}

%
[A-Za-Z] [a-zA-Z0-9] * {printf ("word [% s]", yytext );}
[A-zA-Z0-9 //.-] + printf ("FILENAME ");
/"Printf (" quote ");
/{Printf ("obrace ");
/} Printf ("ebrace ");
; {Sem_count ++; printf ("Semicolon ");}
/N printf ("/N ");
[/T] +/* ignore whitespace */;
%
// The Rule definition section above, and the function definition section below
Int main (INT avgs, char * avgr [])
{
Yyin = fopen (avgr [1], "R ");
If (! Yyin)
{
Return 0;
}
Yylex ();
Printf ("sem_count: % d/N", sem_count );
Fclose (yyin );

Return 1;
}

The common format of lex is as follows:

Character Description

A-Z, 0-9, A-z

It consists of characters and numbers in some modes.

.

Match any character except/n.

-

Used to specify the range. For example, a A-Z refers to all characters from A to Z.

[]

A character set. MatchArbitraryCharacter. If the first character is^Then it indicates the negative mode. For example, [ABC] matches a, B, and any one of C.

*

Match0Or multiple of the above modes.

+

Match1Or multiple of the above modes.

?

Match0 or 1The above mode.

$

Match the end of a row as the last character of the pattern.

{}

Indicates the number of times a mode may appear. For example, a {1, 3} indicates that a may appear once or three times.

/

Used to convert metacharacters. It is also used to overwrite the special meaning defined in this table, and only take the intention of the character.

^

No.

|

The logic or between expressions.

"<Symbol>"

The literal meaning of a character. Metacharacters.

/

Forward match. If "/" in the matched template is followed by a subsequent expression, only the part before "/" in the template is matched. For example, if A01 is input, a0 in the template A0/1 matches.

()

Group a series of regular expressions.


Examples of Regular Expressions

Regular Expression Description

Joke [RS]

Match jokes or Joker.

A {1, 2} shis +

Match aashis, ashis, aashi, and ASHI.

(A [B-E]) +

Match 0 or 1 of all characters from B to E that follow after a appears.

Use Lex to scan the preceding example file lex_example.l:
Lex lex_example.l
By default, the lex. yy. c file is generated, and the file is compiled with GCC. Note that the-ll option is available:
GCC Lex. yy. C-O analyze-ll
In this way, a simple lexical analyze is generated. assume there is a file demo, and its content is as follows:
Firstword;
Secondword;
Thirdword

Fourthword {
Fifthword
}
Enter the following command:
./Analyze demo
The following information is displayed:
Word [firstword] semicolon
Word [secondword] semicolon
Word [thirdword]

Word [fourthword] obrace
Word [effecthword]
Ebrace
Sem_count: 2

In fact, for the above lex_example.l file, the function definition part can be completely omitted, because lex will automatically generate the main function for you. In this case, the analyze is still generated according to the above method, and enter the command:
./Analse <demo
The result is as follows:
Word [firstword] semicolon
Word [secondword] semicolon
Word [thirdword]

Word [fourthword] obrace
Word [effecthword]
Ebrace


In the above lex_example.l file, we also use two variables:
Extern char * yytext;
Extern file * yyin;
These two variables are external excuses provided by Lex. You can change them as needed. Lex provides the following interfaces:

Lex variable

Yyin File * type. It points to the current file being parsed by lexer.
Yyout File * type. It points to the location of the record output by lexer. By default, both yyin and yyout point to standard input and output.
Yytext The text in the matching mode is stored in this variable (char *).
Yyleng The length of the matching mode.
Yylineno Provides the current number of rows. (Lexer is not necessarily supported .)

Lex Functions

Yylex () This function starts analysis. It is automatically generated by lex.
Yywrap () This function is called at the end of a file (or input. If the return value of the function is 1, the resolution is stopped. Therefore, it can be used to parse multiple files. The code can be written in the third section to parse multiple files. The method is to use the yyin file pointer (see the table above) to point to different files until all files are parsed. Finally, yywrap () can return 1 to indicate the end of resolution.
Yyless (int n) Can this function be used to return the delimiters? All read marks except characters.
Yymore () This function tells lexer to append the next tag to the current tag.

The following is a. l file that calculates the number of characters. If you are interested, compile and try it.
% {
/*
Alibaba Cloud 100 Linux
Www.linmu100.com
*/
Int WC = 0;/* Word Count */
%}
 
%
[A-Za-Z] + {WC ++ ;}
/N |. {/* gobble up */}
%
Int main (void)
{
Int n = yylex ();
Return N;
}
 
Int yywrap (void)
{
Printf ("Word Count: % d/N", WC );
Return 1;
}

YACC Introduction

YACC is short for yet another compiler. The GNU version of YACC is called bison. It is a Syntax Parsing tool. It is written in the BNF (Backus Naur Form) paradigm. By convention, the YACC file has a. y suffix.
In fact, YACC is the core of the true analysis syntax. The format of the. y file is the same as that of the. l file, but each section has different meanings:
1. Global variable declaration, Terminator (terminal symbol) Declaration
2. syntax definition
3. Function Definition

The following is a simple yacc_example.y file that defines a simple calculator:
% {
// Global variable Declaration
# Include <ctype. h>
# Include <stdio. h>
# Define yystype double/* Double Type for YACC stack; For yylval */
 
/* Forest 100 www.linmu100.com */
 
Void yyerror (const char * Str)
{
Fprintf (stderr, "error: % s/n", STR );
}
%}
// Terminator Declaration
% Token number
 
%
Lines: lines expr '/N' {printf ("% G/N", $2 );}
| Lines '/N'
|/* E */
| Error '/N' {yyerror ("reenter last line:");/* yyerrok ();*/}
;
 
Expr: expr '+ 'TERM {$ =$ 1 + $3 ;}
| Expr '-'TERM {$ = =$ 1-$3 ;}
| Term
;
 
Term: term '*' factor {$ = =$ 1*$3 ;}
| Term '/' factor {$ = $1/$3 ;}
| Factor
;
 
Factor: '('expr')' {$ =$ 2 ;}
| '('Expr error {$ = $2; yyerror ("missing')'");/* yyerrok ();*/}
| '-' Factor {$ =-$2 ;}
| Number
;
 
%
// The above part is the syntax definition, and the following part is the Function Definition
Int main (void)
{
Return yyparse ();
}
 
Int yylex (void)
{
Int C;
While (C = getchar () = '');
If (C = '.' | isdigit (c )){
Ungetc (C, stdin );
Scanf ("% lf", & yylval );
Return number;
}
Return C;
}

Use YACC to scan this file:
YACC yacc_example.y
By default, a file Y. Tab. C is generated, and the file is compiled with GCC. Note that the option-ll or-ly is required:
Gcc y. Tab. C-O analyze-ll
Run./analyze: The result is shown in:

The following describes the Rules of The. y File Based on the yacc_example.y file:
1. In the global variable declaration section, an interface function yyerror is declared, which is used to call when an error occurs. This section mainly declares some variables, data structures, and functions.
2.% token number declares a terminal character (Terminator), which is returned by Lex and will be used in YACC syntax rules.
3. the syntax rules section declares the Syntax:
3.1 syntax rules have only one interface for external use. Therefore, beginners often make syntax errors with multiple interfaces for external use.
3.2 both the lex file and the YACC file should note that the most likely lexical and syntax rules should be placed before the conflicting rules, so as to ensure that the most likely rules will be matched first, for example, in the Lex file:
Temperator return T1;
Temp return T2;
In the YACC file, the example is as follows:
Command:
Number char
| Number
;

Pay attention to the global syntax and recursive call for the. y file.

Beginners may be unfamiliar with the YACC file rules. The key is to do more exercises.

Combination of lex and YACC

When using lex and YACC, note that
The lex File Header references the header file generated by YACC: "Y. Tab. H"

The following is an example of a combination of lex and YACC:
Lex_yacc_exp.l file:
% {
/* Forest 100
Www.linmu100.com
*/
# Include <stdio. h>
# Include <string. h>
# Include "Y. Tab. H"
Extern char * yytext;
%}
%
[0-9] + yylval. Number = atoi (yytext); return number;
Heater return tokheater;
Heat return tokheat;
On | off yylval. Number =! Strcmp (yytext, "on"); return state;
Target Return toktarget;
Temperature Return toktemperature;
[A-z0-9] + yylval. String = strdup (yytext); Return word;
/N/* ignore end of line */;
[/T] +/* ignore whitespace */;
%

Lex_yacc_exp.y file:
% {
/* Forest 100
Www.linmu100.com
*/
# Include <stdio. h>
# Include <string. h>

Void yyerror (const char * Str)
{
Fprintf (stderr, "error: % s/n", STR );
}

Int yywrap ()
{
Return 1;
}

Main ()
{
Yyparse ();
}

Char * heater = "XL's test ";

%}

% Token tokheater tokheat toktarget toktemperature

% Union
{
Int number;
Char * string;
}

% Token <number> state
% Token <number> Number
% Token <string> word

%

Commands:
| Commands command
;

Command:
Heat_switch | target_set | heater_select

Heat_switch:
Tokheat state
{
If ($2)
Printf ("/theater '% s' turned on/N", heater );
Else
Printf ("/theat '% s' turned off/N", heater );
}
;

Target_set:
Toktarget toktemperature number
{
Printf ("/theater '% s' temperature set to % d/N", heater, $3 );
}
;

Heater_select:
Tokheater word
{
Printf ("/tselected heater '% s'/N", $2 );
Heater = $2;
}
;

Run the following command to generate three files, Lex. yy. C, Y. Tab. C, and Y. Tab. h:
Lex lex_yacc_exp.l
YACC-D lex_yacc_exp.y
GCC Lex. yy. c y. Tab. C-O analyze-ll

Create a syntax demo with the following content:
Heat on
Target temperature 99
Heater asdfsieiwef99adsf

Enter./analyze <demo to analyze the demo file. The following results are displayed:

Conclusion:
Lex and YACC are very powerful tools. Here we will only briefly introduce some basic knowledge.
The Lex & YACC page has many interesting historical references and excellent Lex and YACC documents.

Reference:
Http://www.ibm.com/developerworks/cn/linux/l-lexyac.html
Http://blog.csdn.net/ThinkinginLinux/archive/2005/03/19/323379.aspx

Reprinted

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.