Lex Quick Start

Source: Internet
Author: User
Tags lexer

Regular expressions expressed in Lex language:

. Match any character except \ n (line break)

-Used to specify a range. For example: A-Z refers to all characters from a to Z;
A-Z refers to all characters from A to Z;
0-9 refers to all characters from 0 to 9;

* Match 0 or more of the above modes. For example: ab* says: A,ab,abb,abb ...

+ match 1 or more of the above patterns. For example: ab+ says: ab,abb,abb,abbb ...

? match 0 or 1 of the above patterns. For example: AB? Denotes: A or AB

\ is used to escape meta-characters. It is also used to override the special meaning defined by the character in this table, taking only the character's intent.
Common escape characters:
\a Bell (BEL)
\b Backspace (BS) to move the current position to the previous column
\f (FF), moves the current position to the beginning of the next page
\ n line break (LF), moving the current position to the beginning of the next line
\ r Enter (CR) to move the current position to the beginning of the bank
\ t Horizontal tabulation (HT) (jumps to the next tab position)
\ \ represents a backslash character ' \ '
\ ' represents a single quotation mark (apostrophe) character
\ "represents a double-quote character
A/s null character (NULL)

/forward matching. If the "/" in the matched template is followed by a follow-up expression, only the part preceding "/" in the template is matched. For example: If you enter A01, the A0 in the template a0/1 is matched.

^ denotes negation. For example: [^ab] denotes all characters except AB
[^0-9a-za-z] denotes all characters except numbers and letters

| The logic or the expression between expressions. For example: ("A" | " B ") denotes the character A or character B
(R|s) represents all the content expressed by the regular expression R or S

$ matches the end of a line as the last character of the pattern.

() groups a series of regular expressions, and the precedence in parentheses is higher than the precedence outside the parentheses

[] a character set. Matches any character within the parentheses. If the first character is ^ then it represents negation. For example: [AbC] matches any of a, B, and C.

{} indicates the number of times a pattern may occur. For example: a{1,3} indicates that a may occur 1 or 3 times (i.e.: A or AAA);
A{2} indicates that A repeats two times (that is: AA)
A{10,} indicates that A repeats more than 10 times
If it contains a name, it is replaced with that name. For example: Regular expression R [A-z] (that is, r for lowercase letters), then {r}+ represents all lowercase strings


Recognize several expressions:
JOKE[RS] matches jokes or joker. Here R and S is the letter is not an expression, if R or S is an expression, it should be rewritten as Joke[{r}{s}]

a{1,2}shis+ match Ashi, Ashis, ashiss,ashisss ... Aashi,aashis,aashiss,ashisss .....

DIGIT[0-9] expression digit represents a number
number{digit}+ expression number denotes all natural numbers

DIGIT[0-9]
LETTER[A-ZA-Z]
Identifier{letter} ({letter}|{ digit}) * denotes all identifiers that begin with a letter and are composed of letters and numbers

whitespace["" \t\n\r]+ denotes a blank part (that is, contains a space, Tab, line break, carriage return)

Compareoperator["<" = ">"] "="? Represents a comparison operator (includes: <,=,>,<=,==,>=)

[A-z] {1,8} represents any string with a length not exceeding 8

("+"|" -")? [0-9]+ represents a signed integer

("+"|" -")? [0-9]+ ("." [0-9])? Indicates the number of symbols that can be taken with a decimal point

("+"|" -")? [0-9]+ ("." [0-9])? (E ("+" | " -")? [0-9]+)? Indicates the number of symbols that can be indexed


Lex source program structure: (Divided into three parts, each part is separated by percent)


(1) Description section
The description section is used to define the regular expression name to be used in the recognition rule, including: variable description, identifier constant description, regular definition, and C language description information.

Where the C language description information must be delimited by the "%{" and "%}", which mainly includes the future generation of the lexical analysis program to use some of the library files and the declaration of global variables,%{and%} in the middle of the content
is copied to the front of the lexical parser generated by Lex intact.

Here is a section of the lex code for the description:
%{
# include<stdio.h>
int wordCount = 0;
%}
chars [A-za-z]
Numbers ([0-9]) +
Delim ["" \n\t]
whitespace {delim}+
words {chars}+
%%

(2) Identification rules
The recognition rule gives the definition of a word with a regular expression, and the program fragment to be executed after identifying the regular expression, in the following form:
p1{Action 1}
p2{Action 2}
.
.
.
p3{Action 3}
The Pi (I=1,2...N) here represents the regular expression,
{Action I} represents the C-language program statement (which typically returns the string token and string value) of the action to be performed by the lexical analysis when the string expressed by the PI expression is recognized.

For example:
{words} {wordcount++;}
{whitespace} {/* do nothing*/}
{numbers} {numcount++;}

(3) Auxiliary process
The secondary process gives the user the additional action required to identify certain actions of the rule that need to be called. If it is not a C-language library function, you will give a specific definition here. These programs can also be
In a separate program file, compiled separately, and then assembled together with the lexical analysis program connection.

For example:
void Main ()
{
printf ("Start analysis \ n");
Yylex (); /* Start the analysis*/
printf ("Analysis results \ n");
printf ("No of Words:%d\n number:%d\n", WordCount, Numcount);
return 0;
}

int Yywrap ()
{
return 1;
}


Lex Variables and functions:
Lex has several functions and variables that provide different information that can be used to compile programs that implement complex functions. Some variables and functions are listed in the following table, along with their use.


Lex variable:
The yyinfile* type. It points to the current file that Lexer is parsing.

The yyoutfile* type. It points to the location where the lexer output is recorded. By default, both Yyin and yyout point to standard inputs and outputs.

The text of the Yytext matching pattern is stored in this variable (char*).

Yyleng gives the length of the matching pattern.

Yylineno provides the current row count information. (Lexer does not necessarily support.) )

Lex functions
Yylex () This function begins analysis. It is generated automatically by Lex.

Yywrap () This function is called at the end of the file (or input). If the return value of the function is 1, it stops parsing. So it can be used to parse multiple files. The code can be written in the third paragraph, which can parse multiple files. The method is to use the Yyin file pointer (see the table above) to point to a different file until all the files are parsed. Finally, Yywrap () can return to the
Represents the end of the resolution.

yyless (int n) This function can be used to send back except for the former n? All read-out marks outside of the characters.

Yymore () This function tells Lexer to append the next tag to the current tag.

Lex Quick Start

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.