(i), written in front
Lex is a tool for building lexical analysis programs. The lexical analysis program marks the random input stream and splits it into lexical tags. The labeled output can then be processed further, usually by YACC, or he becomes the "final product".
When writing the lex specification, you can create a set of patterns used by Lex to match the input. Each time a pattern is matched, the lex program invokes the C code we provide to process the matched text. In this way, the lex program splits the input into a string that is a token. Lex itself does not produce an executable program. Instead, he translates the Lex specification into a file containing C routines Yylex (). The program calls Yylex () to run the lexical parser.
(ii), regular expression 1: regular expression character
Regular expressions are widely used in UNIX environments, and Lex can use rich regular expression languages.
A regular expression is a pattern description that uses the meta (meta) language. A meta-language is used to describe a particular pattern. The characters that form the regular expression are:
symbols |
meaning |
. |
Matches any single character except a newline ("\ n") |
* |
Match 0 or more copies of the preceding expression |
[] |
Character class that matches any character in parentheses |
^ |
The first character of the regular expression matches the beginning of the line, and also for the negation in square brackets |
$ |
Matches the end of a line as the last character of a regular expression |
{} |
Indicates how many times the preceding pattern is allowed to match when the parentheses contain one or two digits |
\ |
Used to escape characters |
+ |
Matches one or more occurrences of the preceding regular expression |
? |
Matches 0 or one occurrence of the preceding regular expression |
|
|
“...” |
Each character in the quotation marks is interpreted as a literal meaning |
/ |
Matches the preceding regular expression only when followed by a specified regular expression |
() |
Combine a series of regular expressions into a new regular expression |
2:lex's regular example program
Below we write a fractional Lex specification:
First, let's take a look at the regular expressions that represent decimals:
-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)
Now let's take a look at Lex's implementation:
Name: Xs.lex
%%[\n\t ] ;-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { printf("number\n"); }. ECHO;%%main(){ yylex();}int yywrap(){ return 0;}
We compile the program using the following command:
lex xs.lexgcc lex.yy.c -o xs./xs //运行程序
Let's look at the effect of the operation:
(c), example: Word counting procedure
Let's take a closer look at Lex with an example.
The Lex specification consists of two parts: defining segments, rule segments, and user subroutine segments. The first part deals with the options Lex uses in the lexical parser, and half establishes the execution environment in which the lexical parser runs.
The definition section for the word count example is as follows:
%{unsigned int charCount = 0,wordCount = 0,lineCount = 0;%}word [\t\n]+eol \n
The sections enclosed by "%{" and "%}" are C code, and they will be copied verbatim into the lexical parser. The C code is put into the output code from the beginning.
The last two lines are defined. Lex provides a simple substitution mechanism that makes it easy to define long or complex patterns. We have added two definitions here, and the first definition provides a word description: a non-empty combination of characters other than spaces, tabs, and line breaks. The second definition describes the line end character, which is a newline.
The rule segment contains patterns and actions that specify the lexical parser. The following is the rule segment for the word count in the example:
%%{word} { wordCount++; charCount+=yyleng; }{eol} { charCount++; lineCount++; }. { charCount++; }
The rule segment starts with "percent". In the pattern, Lex uses substitution instead of the name in curly braces {}. After the lexical parser has identified the complete word, our example increases the number of words and characters.
An action consisting of multiple statements enclosed in curly braces produces a C-language compound statement.
It is worth repeating that Lex always tries to match the longest string. Therefore, the lexical parser will use the string "well-being" as a word.
The example also uses the internal variable Yyleng of Lex, which contains the length of the string that the lexical parser recognizes. If Well-being,yyleng is matched, it is 10.
The third and final parts of the Lex specification are user subroutine segments. He was separated by "percent" and the preceding paragraph. The User subroutine segment contains any valid C code. He was copied verbatim into the generated lexical parser.
%%int main(){ yylex(); printf("%d %d %d",lineCount,wordCount,charCount); return 0;}int yywrap(){ return 0;}
First, it calls the lexical parser's entry point, Yylex (), and then calls printf () to print the results of this run.
Let's take a look at the overall program for the example:
%{unsigned int charCount = 0,wordCount = 0,lineCount = 0;%}word [^ \t\n]+eol \n%%{word} { wordCount++; charCount+=yyleng; }{eol} { charCount++; lineCount++; }. { charCount++; }%%int main(){ yylex(); printf("%d %d %d",lineCount,wordCount,charCount); return0;}int yywrap(){ return0;}
Note that our example does not have any fancy operations: he neither accepts command-line arguments nor opens any files, indicating that standard input is read by default with Lex. Of course, we can reconnect the input stream for Lex.
Let's take a look:
%%intMain (ARGC,ARGV)intArgc;char**argv {if(Argc >1) {FILE*file; File = fopen (argv[1],"R");if(!file) {fprintf (stderr,"Could not open %s\ n", argv[1]);Exit(1); } yyin = file; } Yylex ();printf("%d %d%d \ n", Linecount,wordcount,charcount);return 0;}intYywrap () {// 0-Input not completed1-Input completedreturn 1;}
The lex lexical parser reads the input from the standard IO file Yyin, so you only need to change the yyin when needed. The default value for Yyin is stdin.
Compile, run our program, let's look at the effect of the operation:
When Yylex () reaches the end of the input file, it calls Yywrap (), which returns a value of 0 or 1. If the value is 1, the program is completed and no input is entered. In other words, if the value is 0, then the lexical parser assumes that Yywrap () has opened another file he wants to read, and continues to read Yyin. The default yywrap () always returns 1. Typically, you customize a yywrap () function.
Let's implement a lex program that handles multiple files:
%{/ * Multi-file Word counting program * */ unsigned LongCharCount =0, WordCount =0, LineCount =0;#undef yywrap/* Sometimes a macro by default * /%}word [^ \t\n]+eol \n%%{word} {wordcount++; charcount + = Yyleng;} {EOL} {charcount++; linecount++;}. charcount++;%%Char**filelist;unsignedCurrentfile =0;unsignedNfiles;unsigned LongTOTALCC =0;unsigned LongTOTALWC =0;unsigned LongTOTALLC =0;intMainintargcChar*argv[]) {FILE *file; FileList = argv +1; Nfiles = argc-1;if(ARGC = =2){/* * Because there is no need to print summary rows, processing a single file is different than working with multiple files */Currentfile =1; File = fopen (argv[1],"R");if(!file) {fprintf(stderr,"Could not open%s\n", argv[1]);Exit(1); } yyin = file; }if(Argc >2) Yywrap ();/ * Open the first file * /Yylex ();/ * * Processing 0 or one file in a different place from processing multiple files */ if(Argc >2){printf("%8lu%8lu%8lu%s\n", linecount,wordcount,charcount,filelist[currentfile-1]); TOTALCC + = CharCount; TOTALWC + = WordCount; TOTALLC + = LineCount;printf("%8lu%8lu%8lu total\n", TOTALLC,TOTALWC,TOTALCC); }Else{printf("%8lu%8lu%8lu\n", Linecount,wordcount,charcount); }return 0;}/ * The lexical parser calls Yywrap to process EOF. (for example, in this case * we connect to a new file) */intYywrap () {FILE *file;if((Currentfile! =0) && (Nfiles >1) && (Currentfile < nfiles)) {/ * * Print out statistics for the previous file */ printf("%8lu%8lu%8lu%s\n", linecount,wordcount,charcount,filelist[currentfile-1]); TOTALCC + = CharCount; TOTALWC + = WordCount; TOTALLC + = LineCount; CharCount = WordCount = LineCount =0; Fclose (Yyin);/* Finish processing this file */} while(Filelist[currentfile]! = (Char*)0) {file = fopen (filelist[currentfile++],"R");if(File! = NULL) {yyin = file; Break; }fprintf(stderr,"Could not open%s\n", filelist[currentfile-1]); }return(file?)0:1);/ * 0 indicates more input * /}
The example uses Yywrap () to perform continuous processing. Each time the lexical parser calls Yywrap (), it attempts to open the next file name from the command line and assigns the open file to Yyin, and returns 0 if there is another one, or 1 if not.
Let's look at the effect of the run:
(d): written in the back
We will continue to study Lex later, and there will be more example programs to make us understand and use lex more deeply. Come on.
Code download
Using Lex---01