Learn how to use flex

Source: Internet
Author: User
Tags constant definition

I know a lot about flex lexical analysis.

The following is an example to describe how to use flex.

Based on the lexical analysis content, Flex is used to construct the lexical analyzer of PL/0 language.

Since PL/0 lexical analyzer is constructed, it is necessary to take a look at the introduction of pl0 language and the corresponding Syntax:

2 PL/0 language

I. pl/0 language overview.

PL/0 is a subset of Pascal. It has typical features of General advanced programming languages. The PL/0 language compilation program has a clear structure and high readability, which fully reflects the basic organization, technology, and steps of a high-level language compilation program, it is a very suitable teaching model for small compilation programs.

 

Ii. pl/0 grammar description.

Because only PL/0 lexical analysis is performed, only lexical-related content is listed here, and the rest is skipped.

The expanded ebnf is used for representation below:

Symbol description:

'<>': Indicates the syntactic structure component, or the unit of the syntax, which is a non-terminator of PL/0.

': =': The left part of the symbol is defined by the right part.

'|': Indicates "or", that is, it cannot be defined by multiple right sides.

'{}': Indicates that the syntax in curly braces can be repeated. You can repeat 0 to any number of times without adding a lower limit. The upper and lower limits limit the number of repeated times.

'[]': The component in square brackets is any option.

'()': Indicates that the component in parentheses takes precedence.

Ebnf representation of PL/0 Grammar

<Constant definition> ::=< identifier >=< unsigned integer>

<Unsigned number >::=< number >{< number>}

<Identifier >::=< letter >{< letter >|< number>}

<Addition operator>: = + |-

<Multiplication operator>: = * |/

<Relational operator >::=|#|||<=|>|>|>=

<Delimiter >::= (|) |, |;|.

 

Next, let's take a look at what Lex is.

 

2 flex lexical Generator

I. Flex overview.

Flex is a tool used to generate scanners that recognize Lexical Patterns in text. Description of the scanner to be generated when flex reads from a given file or from a standard input (when no file is given. The format of this description is a regular expression and C code. It is called a rule. Flex output is a C source program named lexyy. C, which defines a function named yylex () in lexyy. C. Lexyy. C can be compiled and linked to the flex library using the-lfl link option to generate executable files. Execute this file. It will analyze its input and check whether it meets the regular expression. If it finds one, it will execute the corresponding C code.

 

Ii. Flex operation and application process.

Flex and Lex have almost identical functions and structures. The lex compiler receives the Lex source program (which describes and describes the lexical analyzer to be generated). The lex compiler processes the Lex source program and generates a lexical analyzer as the output. Generally, the lex source program is processed by the lex compiler to generate a lexyy. C program. This is the corresponding lexical analyzer program. After being compiled by the C compiler, an executable program can be generated. With this executable program, we can perform lexical analysis on the corresponding program language.

 

Iii. Flex language and structure.

Flex Source Language Structure

Description part/* contains the mode macro definition and C language description */

%

Rule part/* Conversion rule (generally also contains the action corresponding to the rule )*/

%

User code/* C code of the auxiliary process required in the Rule Action Section */

The flex language is a formal description of the word set that represents the language, to solve the problem of Regular Rule input. The flex language is a specialized language automatically constructed by the lexical analyzer. Its program structure consists of three parts. The first part of the description includes the C code and macro schema definition. The pattern macro definition is actually an aid to the regular expression that appears in the recognition rules. For example, letters in a language can be defined:

Letter [A-Za-Z]

Numbers can be defined:

Digit [0-9]

Except for the macro definition, the remaining code of the definition must be included in the symbols % {and %. In addition, the C language library files, external variables, and partially declared functions used by Flex should also be placed in % {and %} respectively. For example, the following is a description of the Flex language:

% {

# Include <stdio. h>

# Include <stdlib. h>

Int flag;

Void function ();

# Define err-1

%}

Digit [0-9]

Letter [A-Za-Z]

Newline [/n]

%

Note: The identifier % And % {, %} must be written in the top line. In addition, you can add comments in C language at will.

The second part is the main part of the Flex program. The general form is

Mode 1 Action 1

Mode 2 Action 2

......

Mode N

The pattern is the description of the words in the language to be analyzed, expressed in a regular expression. The action corresponds to the matching mode. Generally, C code (depending on the corresponding support platform) is used to represent the action to be processed in the mode. After a pattern word is identified, the lexical analyzer (Flex) executes the corresponding program. The detailed mode is not provided here. The flex manual will have a more detailed definition.

The third part is the user code definition part. It is the c Function and main function that processes the mode. As an auxiliary process, it is the processing process required to support the Rule action and is a supplement to the action in the rule. If these processes are not C library functions, they must be defined and compiled separately and assembled together with the generated lexical analyzer.

It should be noted that the part of the definition and the part of the user code are optional, and the part of the rule is required.

Iv. global variables and functions of lexyy. C.

Note that lexyy is generated in flex. in C, there are some variables and functions defined by itself. These variables and functions are of great help for us to complete lexical analysis, here we will mainly introduce some common functions and variables used in this experiment.

File * yyin/* points to the pointer of the program to be analyzed to be received by the lexical analyzer. If this parameter is not specified, it is directed to the standard input terminal (keyboard) by default ). If the program to be analyzed is in the file format, we can point this pointer to the address pointer of the file. */

File * yyout/* is the same as above. The only difference is that the Pointer Points to the output file. By default, it points to the standard output terminal (screen ). We can change the direction of the output stream by redirecting the pointer. */

Char * yytext/* indicates the address of the recognized word. It is used to save the string that is scanned for a match. */

The number of characters in the int yyleng/* matching string. */

 

Function

Echo/* default action of FLEX, which is generally the output string */

Yywrap ()/* indicates the function to be called after a scan. A value is returned. If the value is 1, flex will not continue scanning. */

Yyrestart ()/* redirect flex input */

Of course, there are many other functions, such as yymore (), compatible grouping, conditional mode, and macro. We will not detail them here because this experiment is not used. For details, refer to the flex manual.

 

V. Overview of flex lexical analyzer implementation.

The core of the automatic lexical analyzer generator is the lex compiler. The lex compiler is a lex source program that describes word sets in a language and converts it into a lexical analyzer that can recognize words in the language. The Lexical analyzer identifies and processes words like finite automatic machines.

Based on the lex source program, the implementation steps of the lex compiler are as follows:

(1) construct a corresponding NFA Ni for each PI in the Lex source code recognition rule.

(2) introduce the unique initial State S. From the Initial State S, use the ε arc to convert all NFA Ni (I = 1 ,..., N) connect to the new NFA n '. (1) and (2) complete the construction from a regular expression to a non-deterministic finite automaton.

(3) Determine NFA n' and generate DFA n.

(4) Minimize DFA n.

Scheduler provides the control program. The role of the control program is to activate a finite automatic machine, that is, to control the input string to run on a finite automatic machine. Once the final state is reached, a word described in the Lex source program mode is recognized, call the corresponding action.

 

Now I want to have a certain understanding of Flex. Now let's start to compile the flex analysis PL/0 input file Lex. l file. In fact, this file is used to identify the morphology of PL/0, that is, to identify the description file of PL/0 lexical rules. From the above, we know that such a file is divided into three parts. I will describe each part in detail below:

(I) Description.

Because the final result of lexical analysis is a binary group, and this experiment only provides a formal display, but for a better image, I used a struct structure to describe this binary group, after the analysis, the token is saved to the Binary Group. Another declaration of the print () function is used to output the recognized tokens. Variables for auxiliary operations are also defined. For example, the pointer used to read and write files, and the variable for saving the row number, number of tokens, and number of error operators.

Next is the PL/0 lexical description, which is written in a regular expression:

Digit [0-9]/* number from 0-9 */

Letter [A-Za-Z]/* letters are case sensitive to a-Z, and PL/0 languages are defined in this program ;*/

Number {digit} +/* unsigned integer */

Identifier {letter} ({letter} | {digit}) */* identifier */

Wrongid ({digit} +) {letter} ({letter} | {digit}) */* incorrect ID, such as 123a */

Newline [/n]/* New Line */

Whitespace [/T] +/* tab */

(Ii) Rule section.

Because we have defined the Regular Expression of unsigned integers and identifiers and used an alias, there are fewer Reserved Words and operators and delimiters, therefore, in the Rule section, we will write the corresponding actions to be executed based on each mode. Here, the defined action is to divide each word into different types, and then give the same type to the nearest integer code, and then output the corresponding attributes according to the code in the print () function. Therefore, the main task of the action part is to assign values and call the print () function.

Example: {identifier} {value = 1; print ();}

For detailed PL/0 lexical reserved words, delimiters, and operators, see the appendix below.

(3) User code.

The user code mainly includes three functions:

Main function main () // provides the input and output directions of the source file, and calls the scanner;

Output Processing Function print () // The main function is to output a binary group;

Yywrap () // comes with a function to end scanning;

 

Note:

Note the spaces and tabs, because there are inevitable spaces and tabs in the source program for typographical purposes, but they have no specific significance in the actual lexical analysis, therefore, when reading space or indicator characters, you must "eat" them.

Note newline. Newline is a line break, but when we read a line break, we should execute spaces and tabs ("eat") on it, but we need to count the number of rows, therefore, we set up a variable to increase by 1 every time we read a linefeed.

In addition, Lex executes the action corresponding to the pattern after reading a matching word, but it outputs unmatched words by default, therefore, we need to perform additional processing on the symbols not available in the Analysis Language (such as printing errors ).

 

 

The following is a file I wrote for PL/0 to generate a lexical analyzer using Flex analysis:

% {
# Include <stdio. h>
# Include <stdlib. h>
Void print (); // output the token sequence;
Void main (INT argc, char * argv []); // main function;
Struct token {// Binary Group;
Char * idproperty; // the token property value;
Char * idname; // The recognized token name;
} Entity [1000]; // defines 1000 such tokens, And the size can be changed;
Char * filename; // name of the file to save the result;
Int errnum = 0; // Number of wrong tokens;
Int value; // The Int type of the attribute value;
Int linenum = 1; // number of rows;
Int COUNT = 0; // The number of tokens;
Int flag = 0;
File * fpin; // the pointer to the test file;
File * fpout; // result file pointer;
%}
Digit [0-9]
Letter [A-Za-Z]
Number {digit} +
Identifier {letter} ({letter} | {digit })*
Wrongid ({digit} +) {letter} ({letter} | {digit })*
Newline [/n]
Whitespace [/T] +
%
"Procedure" |
"Call" |
"Begin" |
"End" |
"Var" |
"Const" |
"If" |
"Then" |
"While" |
"Do" |
"Read" |
"Write" |
"Odd" {value = 0; print ();}
{Identifier} {value = 1; print ();}
{Wrongid} {value = 6; print ();}
{Number} {value = 2; print ();}
"+" | "-" | "*" | "/" {Value = 3; print ();}
"<>" |
"> =" |
"<=" |
": =" |
"=" | "#" | "<" | ">" {Value = 4; print ();}
"(" | ")" | "," | ";" |
"." {Value = 5; print ();}
{Newline} {linenum + = 1 ;}
{Whitespace }{;}
""{;}
. {Value = 7; print ();}
%
Int yywrap ()
{
Fclose (fpin );
Return 1;
}

Void print ()
{
Count + = 1;
If (flag! = 1 ){
If (fpout = fopen (filename, "A") = NULL ){
Printf ("cannot write the file/N ");
Exit (0 );
}
}
If (value <= 5 ){
Switch (value ){
Case 0: entity [count-1]. idproperty = "basickey"; break;
Case 1: entity [count-1]. idproperty = "identifier"; break;
Case 2: entity [count-1]. idproperty = "Number"; break;
Case 3: entity [count-1]. idproperty = "arithmetic-op"; break;
Case 4: entity [count-1]. idproperty = "relation-op"; break;
Case 5: entity [count-1]. idproperty = "Boundary-op"; break;
}
Entity [count-1]. idname = yytext;
Fprintf (fpout, "% d <% s, % S>/N", Count, entity [count-1]. idname, entity [count-1]. idproperty );
} Else {
Errnum + = 1;
Switch (value ){
Case 6: entity [count-1]. idproperty = "mixed number and letter:"; break;
Case 7: entity [count-1]. idproperty = "unkown OPERATOR:"; break;
}
Entity [count-1]. idname = yytext;
Fprintf (fpout, "% d [Line: % d]: % S/" % S/"/N", Count, linenum, entity [count-1]. idproperty, entity [count-1]. idname );
}
If (flag! = 1) fclose (fpout );
}

Void main (INT argc, char * argv [])
{
If (argc = 1 ){
Printf ("Please input the PL // 0 Program (CTRL + Z to end)/n ");
Flag = 1;
Fpin = stdin;
Fpout = stdout;
}
If (argc = 2) argv [2] = "defresult.txt ";
Filename = argv [2];
If (flag! = 1 ){
If (fpin = fopen (argv [1], "R") = NULL ){
Printf ("cannot open the file/N ");
Exit (0 );
}
}
Yyin = fpin;
Yylex ();
If (flag! = 1 ){
If (fpout = fopen (filename, "A") = NULL ){
Printf ("cannot write the file/N ");
Exit (0 );
}
}
Fprintf (fpout, "/N ");
Fprintf (fpout, "% d symbol (s) found./N % d error (s) found./N", Count, errnum );
Fprintf (fpout, "============================================== =======================================/N ");
If (flag! = 1) fclose (fpout );
Yywrap ();

}

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/litchh/archive/2004/07/14/40983.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.