Lexical analyzer generation tool flex

Source: Internet
Author: User
1. Introduction to flexThe description of a word is called the lexical pattern. A regular expression is generally used for exact description. Flex reads a text file in a specified format and outputs a C language source program as follows. + ------------ ++ ---------------- + | Input file *. l | ------> | flex tool | ------> | output file Lex. YY. c | + ------------ ++ ---------------- + the flex input file is called the Lex source file, which contains a regular expression and C language code for processing the corresponding mode. The extensions of lex source files are represented by. l. Flex automatically generates the corresponding lexical analysis function int yylex () by scanning the source file and outputs it to the file named Lex. yy. C. In practical use, you can change it to lexyy. C. This file is the lex output file or the output lexical analyzer. You can also add int yylex () to your project file. 2. Lex source file formatLex has strict format requirements on source files. For example, if you change the statements that require top-line writing to non-top-line writing, a fatal error will occur. However, Lex has a weak ability to identify errors, so you must pay attention to this when writing. The lex source file consists of three parts. Each part is separated by the '%' on the top line. The format is as follows:

Definition
%
Rule Section
%
Add C language to the user

The following describes the source program count. l used to count the number of times a word appears. The content of count. L is as follows:

% {
# Include "stdio. H"
# Include "stdlib. H"
Int num_num = 0, num_id = 0;
%}
Integer [-+]? [1-9] [0-9] *
Id [A-Za-Z] [a-zA-Z_0-9] *
Space [/n/T]
%
{INTEGER} {num_num ++;
Printf ("(num = % d)/n", atoi (yytext);/* print the numeric value */
/* Add one digit to the number */
}

{ID} { num_id++;
    printf("(id=%s)/n",yytext);
    }

{Space} |
.{
/* Do nothing. filter out white characters and other characters */
}
%

int main()
{
   yylex();
   printf("num=%d,id=%d/n",num_num,num_id);
   return 0;
}

Int yywrap () // This function must be provided by the user

{
Return 1;
}

 

2.1 Definition:

%{
   #include "stdio.h"
   #include "stdlib.h"
   int num_num=0,num_id=0;
%}
INTEGER [-+]?[1-9][0-9]*
ID [a-zA-Z][a-zA-Z_0-9]*
SPACE [ /n/t]

The definition part consists of the C language code, the macro definition of the mode, and the start condition description of the condition mode. The C code is introduced by % {And} % of the top line. When lex scans the source file, it copies the parts between % {And} % to the output file Lex. YY. c. The above definition does not have a description of the Start condition of the condition mode. Only the macro definition of the C code and mode is supported. The Mode macro definition is the definition of a regular expression, as shown in the preceding integer [-+]? [1-9] [0-9] * Regular Expression matching is as follows:
Mode Release
X Configure a single letter X
. Match any character except line break '/N'
[Xyz] Match X, Y, or Z
[Abj-Oz] Match letters a, B, Z, and J to o
[^ A-Z] Other characters except uppercase letters and A-Z
[^ A-Z/n] Other characters except uppercase letters, A-Z, and line breaks
R * Matches 0 or multiple R
R + Match one or more R
R? Matches 0 or 1 R
2.2 rule SectionThe rule part is the core part of the lex source file. It includes a set of modes and C language actions (Actions) for processing the corresponding modes after generating the analyzer to identify the corresponding modes ). The format is as follows:

C language code
Mode 1 Action 1
Mode 2 |
Mode 3 Action 3

Same as the definition, C code must appear before the first mode, including % {And} %, and % {must be written in the top line. The Code section between % {And} % can be used to define the local variables used by yylex. The mode must be written in the top line. The mode can be a formal format or a macro name defined in the definition section enclosed. The action is the C code enclosed. And the start brackets {and the mode are separated by white characters, and must be on the same line as the mode. Note: Add "1" after "Mode" to indicate that "Mode 2" and "3" adopt the same action. "3. |" and "Mode 2" are separated by white characters. 2.3 add C language to the userLex does not process this part, but simply copies it to the end of the output file Lex. yy. C. In some parts, you can define the C language functions for processing the mode, the main function, and the function yywrap () to be called by yylex. If you provide these functions in other C modules, part of your code can be omitted. 3. Generate source code and compile and runFlex count. lgcc-G Lex. yy. C-o count run: osdba @ osdba-LAPTOP :~ /Tmp $./count <eofaaa bbb ccc 999eof
(ID = aaa) (ID = BBB) (ID = CCC) (num = 999) num = 1, id = 3osdba @ osdba-LAPTOP :~ /Tmp $ 4. Description of pattern matchingAfter the yylex () function is called, it first checks whether the Global File pointer variable yyin is defined. If so, it is set to the file pointer to be scanned. If none, it is set to the standard input file stdin. Similarly, if the global file pointer variable yyout is not defined, it is set to the standard output file stdout.
If multiple modes match the strings in the scanned file, yylex () runs the pattern that matches the longest string, which is called the longest matching principle "; if multiple modes match strings of the same length,
Yylex () is called the "First Matching Principle ". Yylex () is usually implemented by searching for a character in advance.
If a pattern is matched by an advanced search, yylex () returns one character before the next analysis. See the following example: % Program {printf ("Keyword: % s! /N ", yytext);/* Mode 1 */} procedure {printf (" Keyword: % s! /N ", yytext);/* Mode 2 */} [A-Z] [a-z0-9] * {printf (" identifier: % s! /N ", yytext);/* Mode 3 */} %
For example, if the input string is "programming" and yylex () analyzes the child string "program", pattern one and three can be matched. However, according to the longest search principle, it is found that
String, you can also match Mode 3. In this way, "identifier: programming!" will be output !". If the input string is "program", the first matching principle is followed.
Match and output "Keyword: Program !". Note: If you reverse the order of Mode 1 and Mode 3 in the source file, Mode 1 will never be matched. If no mode matches the input string
Use the default rule to copy the input string to the output file yyout.
5. Common global variables and macrosThere are many common global variables, functions, and Macros in lex. yy. C. Here, only some of the most common variables are pointed out. If you need more details, please read the source file. (1) file * yyin, * yyout: pointer to the character input and result output files. If the user does not define it, it is set to the standard input file stdin and stdout. (2) int
Yylex (): It is a lexical analysis program that automatically moves the file pointers yyin and yyout. When defining the mode action, the user can end with the Return Statement
Yylex (), return
Must return an integer. Because the runtime environment of yylex () is saved as a global variable, yylex () can continue from the breakpoint of the last scan when yylex () is called next time.
Scan, which can be used in syntax analysis. If the user does not define the corresponding return statement, yylex () continues to analyze the scanned file until the end mark EOF the file is reached. Read
When EOF, yylex () calls int
Yywrap () function (this function must be provided by the user). If this function returns a non-0 value, yylex () returns 0 and ends. Otherwise, yylex () continues scanning the file pointed to by yyin. (3) char * yytext: stores the currently recognized word shapes. (4) int yyleng: the length of the string yytext. (5) int yywrap (): see (2) (6) yymore (): Save the currently recognized word shape in yytext, the morphology of the analyzer during the next scan will be appended to yytext. The sample mode is defined as follows ...... Hello {printf ("% s !", Yytext); yymore () ;}world {printf ("% s !", Yytext );}...... When the input string is "helloworld", "Hello!" Is output! Helloworld !" (7) yyless (int n): Roll back n characters in the currently recognized word form to the input (8) unput (char C): Roll Back character C to input, it is used as the start character of the next scan (9) input (): Let the analyzer read the current character from the input buffer, and point yyin to the next character (10) yyterminate (): interrupt the analysis of the current file and point yyin to EOF. (11) yyrestart (File * file): reset the scanner's scan file to file (12) ECHO: copy the currently recognized string to yyout (13) begin: activate the pattern corresponding to the start condition (14) Reject: discard the currently matched string and the current pattern, ask the analyzer to re-scan the current string, and select another best pattern for matching again.
6. Conditional ModeLex provides the control mode in a certain state, called the condition mode. Lex first defines the conditional sentence through % start in the definition section. In the rule section, you can use the macro begin condition name to activate the condition. Begin initial or begin 0 will sleep all the conditional modes and bring the analyzer back to the starting state. For example, when the word "magic" in the input file is identified as follows: When "magic" is identified, for example, if "magic" is in the first line of the line with the character 'a', "first" is output "; if it is 'B', "second" is output; otherwise, "magic" is output ". If you do not need the conditional mode, you can write the Lex source file as follows:

%{int flag;}%
%%
^a {flag=’a’;ECHO;}
^b {flag=’b’;ECHO;}
/n {flag=0;ECHO;}
magic {
switch(flag)
{
case ‘a’:printf(“first”);break;
case ‘b’:printf(“second”);break;
default :ECHO;break;
}
}
%%

If the conditional mode is used, the above source files can be simplified

%start AA BB CC
%%
^a {ECHO;BEGIN AA;}
^b {ECHO;BEGIN BB;}
/n {ECHO;BEGIN 0;}
<AA>magic {printf(“first”);}
<BB>magic {printf(“second”);}
%%

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.