Lex Yacc Getting Started tutorial

Source: Internet
Author: User
Tags printf regular expression strlen
Disclaimer: Original works, reproduced annotated source http://www.cnblogs.com/vestinfo/ First, Introduction

Recommended book "Flex&bison".

Under Unix are flex and bison. There are many online introductions, most of them are written to understand the people to see, beginners confused. It might be easier to understand Lex and YACC: There are many system configuration files under Linux, and some Linux software has configuration files, so how does the program read the information in the configuration file?

First, the Lex lexical analyzer is used to read the keywords in the configuration file (the token tag is actually considered a keyword). Then put the key words

Submit to YACC,YACC some key words to match, see whether conforms to certain grammatical logic, if conforms to carry on corresponding action.

The example above is to analyze the contents of the configuration file, of course, to analyze the contents of other files. Second simple Lex file example

1, to see Flex&bison This book begins with an example: Enter a few lines of string, the number of output lines, the number of words and the number of characters.

about Yylex is the series 3 article about the related variables in Lex.

/* Just like Unix WC */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
percent
[a-za-z]+  {words++; chars + + strlen (yytext);}
\ n         {chars++; lines++;}
.          {chars++;}
Percent
of main (int argc, char **argv)
{
  yylex ();
  printf ("%8d%8d%8d\n", lines, words, chars);
}

2. Follow the procedure below to compile.

#flex TEST.L

#gcc LEX.YY.C–LFL

#./a.out

3. Analyze this simple Lex file:

(1) percent of the document is divided into 3 paragraphs, the first paragraph is the global Declaration of C and Lex, the second paragraph is the rule segment, the third paragraph is the C code.

(2) The first paragraph of C code to use%{and%}, the third paragraph of the C code is not used.

(3) The second paragraph of the rule section, [a-za-z]+ \ n. is a regular expression, and {} is an action written in C.

about the regular Expression Series 3 article introduction.

4. If you do not use the-LFL option, the code can be as follows (for specific reasons see Lex's library and function analysis):

int chars = 0;
int words = 0;
int lines = 0;
int Yywrap ();
%}
percent
[a-za-z]+  {words++; chars + + strlen (yytext);}
\ n         {chars++; lines++;}
.          {chars++;}
Percent
of main (int argc, char **argv)
{
  yylex ();
  printf ("%8d%8d%8d\n", lines, words, chars);
}
int Yywrap ()
{
	return 1;
}
Third, modify the first example, put the regular expression in the global Declaration

%{
int chars = 0;
int words = 0;
int lines = 0;
%}
mywords	[a-za-z]+ 
mylines	\ n 
mychars	.  
Percent
{mywords}  {words++; chars + + strlen (yytext);}
{Mylines}  {chars++; lines++;}
{Mychars}  {chars++;}
Percent
of main (int argc, char **argv)
{
  yylex ();
  printf ("%8d%8d%8d\n", lines, words, chars);
}

Compile together. Iv. the Scanner as Coroutine (collaborative program)

That is, how to use the scanned tag to other programs, the following example, want to scan to + or-when a special output.

When Yylex is called, the Yylex returns when the corresponding token is scanned, and the value is the value after return;

if the tag is not scanned for return, Yylex continues execution and does not return.

The next call automatically starts at the previous scan location.

%{
enum Yytokentype {
	ADD = 259,
	SUB = 260, 
};
%}
myadd	"+"
mysub	"-"
myother	.
Percent
{Myadd}    {return ADD;}
{mysub}    {return SUB;}
{Myother}  {printf ("Mystery character\n");}
Percent
of main (int argc, char **argv)
{
	int tok;
	The return value for while (Tok = Yylex ()) {				//yylex can only be an add or SUB.
		if (tok = = ADD | | tok = = SUB) {printf ("Meet + or-\n");}
		else {printf ("This else statement is not being printed, \
			because if Yylex return,the retrun value must be ADD or SUB. ");}
	}
}

Five, Yacc--unix under IS Bison

1, yacc Grammar rule part and BNF similar, first look at the BNF Basque paradigm.

(1) The content contained in <> is a must option;

(2) [] The content contained within the [] is optional;

(3) {} contains items that can be repeated 0 to countless times;

(4) | The meaning of "or" in the left and right side of the choice;

(5):: = is the meaning of "defined as";

(6) The contents of the double quotation mark "" represent the characters themselves, whereas double _quote is used to denote double quotation marks.

(7) BNF Paradigm example, the following example is used to define a for statement in Java:

For_statement:: =

"For" ("(Variable_declaration |

(expression ";") | ";" )

[Expression] ";"

[Expression]

")" Statement

2, Yacc Grammar.

result:components {/* action to is
        taken in C * *}
        ;

(1) Components are terminal and non-terminal symbols that are placed in accordance with the rules, followed by the execution of {}.

3, syntax examples.

Param:name EQ NAME { 
	printf ("\tname:%s\tvalue (NAME):%s\n", $1,$3);}			
	| NAME EQ VALUE {
	printf ("\tname:%s\tvalue (VALUE):%s\n", $1,$3);}
	;
Simple_sentence:subject Verb Object
      |     Subject Verb Object prep_phrase;
Subject:    NOUN
      |     Pronoun
      |     adjective subject;
Verb:       verb
      |     Adverb VERB
      |     verb verb;
Object:     NOUN
      |     adjective object;
Prep_phrase:     preposition NOUN;

(1) Understanding | To choose one of the left and right sides, such as | Subject Verb Object prep_phrase, the left side of | is empty,

So the sentence indicates that the match is empty or Subject Verb object prep_phrase, and there is a subject verb object above,

So

Simple_sentence:subject Verb Object

| Subject Verb Object prep_phrase;

The meaning is matching subject verb object or subject verb object prep_phrase; six, Flex and bison combined.

Test.l

%{  
#include "test.tab.h"  
#include <stdio.h>  
#include <stdlib.h>  
%}  
percent  
a   { return a_state;}  
b   {return b_state;}  
C   {return c_state;}  
Not   {return not;}  
%%

Test.y

%{  
#include <stdio.h>  
#include <stdlib.h>  
%}  
%token  a_state b_state c_state Not percent program  
:     
    a_state b_state {  
		printf ("1");  
    }  
    C_state_not_token  {  
		printf ("2");  
	}  
    |    Not {   
		printf ("3");  
    }  
c_state_not_token:c_state {} 
yyerror (const char *s)
{
	fprintf (stderr, "Error:%s\n", s);
} 
int main ()
{
	yyparse ();
	return 0;
}

Compile:

Vii. Analysis of documents and information.

TSET.L Analysis of keywords in test.txt file (i.e. token tag in test.y), encounter token return to TEST.Y,TEST.Y judgment

whether it conforms to certain grammar and corresponds to the corresponding action.

Test.l

%{
#include "test.tab.h"

#include <stdio.h>
#include <string.h>
%}
Char [a-za-z]
num [0-9]
eq [=]
name {char}+ age
{num}+
%
{name}		{yylval = StrDup (yytext); return NAME; }
{eq} 		{return eq;}
{Age} 		{yylval = StrDup (Yytext); return age;}
Percent of
int yywrap ()
{
	return 1;
}
Test.y
%{
#include <stdio.h>  
#include <stdlib.h> 
typedef char* string;
#define YYSTYPE string
%}
%token NAME EQ age
percent
File:record file
    | record
;
Record:name EQ Age {
                printf ('%s is '%s years old!!! \ n ", $, $); }
;
Percent of
int main ()
{
    extern file* Yyin;
    if (! ( Yyin = fopen ("Test.txt", "R"))
    {
        perror ("Cannot open parsefile:");
        return-1;
    }    
    Yyparse ();
    Fclose (Yyin);
    return 0;
}
int Yyerror (char *msg)
{
    printf ("Error encountered:%s \ n", msg);
}
Test .Txt
zhangsan=23
lisi=34
wangwu=43

Compile

Viii. The type of token defined by token and the use of the Union.

Tokens defined by token are of type default int and default assignment starts at 258. As the above example, in the generated header file

The test.tab.h has the following precompilation,

/* Tokens.  */
#ifndef yytokentype
# define Yytokentype/
   * Put the tokens into the symbol table, so that GDB and other Deb Uggers
      know about them.  */
   enum Yytokentype {
     NAME = 258,
     EQ = 259, age
     = 260
   };
#endif

If you want to define a token tag as a different type. The type is first defined in the Union,

%union {
   char *str;
   int  num;
   struct {int num1; int num2;} dnum;
}

Then, as defined below,

%token <str> k_host k_error
%token <str> WORD PATH STRING
%token <num> num 
%token < Dnum> Dnum
add: $$ $ ....

Each symbol in a bison the rule has a value; The value of the target symbol (the one to the
Left of the colon) was called $$ in the action code, and the values of the ' right ' are numbered
$, $, and so forth, up to the number of symbols in the rule.

$$--represents the left sign of the colon, the $1--colon to the right of the first, the second to the right of the $2--colon, and so on.

such as Record:name EQ age {printf ("%s was%s years old!!! \ n ", $, $); } ;

After matching name EQ age, the content represented by name, $ $, is what is represented by age.

Lex Yacc Getting Started Tutorials (3) Regular expressions and Lex variables and functions

Reference: http://www.ibm.com/developerworks/cn/linux/sdk/lex/#resources














Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.