Implement Regular Expression Processing

Source: Internet
Author: User
I was planning to write an article every week, but I was a little lazy, and the MSN space could not insert images in the document, it also gave me an excellent excuse for laziness. Just like sleeping late on a rainy day, I always find excuses like "The sun hasn't gotten up yet. Another excuse is that I want to write my own original articles, or I know my own articles. I can't just use a Google article to quote some perfunctory articles. Even worse, the article still needs to be longer. After writing, no matter what the content is, you must satisfy yourself. It feels like a person who once saw Qian Zhongshu writing an article and wrote it, and thought it was a wonderful place. Then he bowed to himself in the mirror: "Mr. XX, I really admire you for writing such a good article ". Well, let's just talk about it. What we want to talk about today is: to implement a basic regular expression analyzer, this analyzer can process *, + ,? , (), And other operators. Yes, this regular expression is indeed a bit cool, but at least I know the principle, and I don't care about it ......

Before introducing a regular expression, let's first talk about the concept of finite automaton. Er, let's take an example. Please refer to the Code:

# Include
# Include

Using namespace STD;

Enum tokentype
{
Boom_error =-1, // aha, error
Number = 1,
Identifier = 2,
If = 4
};

Int dfa_table [] [37] = {
// 0 1 2 3 4 5 6 7 8 9 a B c d e f g h I j k l m n o p q r s t u v w x Y Z!
, -1}, // S0 -- Starting Status
{,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1}, // S1 -- here it is a number
{3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1}, // S2 -- Variable
{2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1 },
{2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1} // S4 -- this is if
};

//
// Match:
// Specify a string 'str' to determine the string type.
//
// Example:
// If, returns if
// Number, return number
// Variable, returns identifier
//
Tokentype match (string Str)
{
Int state = 0;

For (string: iterator iter = Str. Begin ();
ITER! = Str. End ();
++ ITER) // some people may not get used to it here. Why do you use ++ ITER instead of the one most people get used?
// Because ITER ++ returns the ITER value first and then executes ++, a copy operation is involved here. If your ITER is a very large variable
// Performance may be affected, but ++ ITER only performs ++ operations
{
Char c = * ITER;
Int Index = 0;
If (C> = '0' & C <= '9 ')
{
Index = C-'0 ';
}
Else if (C> = 'A' & C <= 'Z ')
{
Index = C-'A' + 10; // index value of column A in dfa_table
}
Else
{
Index = 36 ;//! The index value of the column in dfa_table, which does not match
}

State = dfa_table [State] [Index];

If (State = boom_error)
Break;
}

Return (tokentype) State;
}

Int g_line = 0;
Void print (tokentype type)
{
Switch (type)
{
Case boom_error:
Cout <++ g_line <": boom_error/N" <>
Break;

Case if:
Cout <++ g_line <": If/N" <>
Break;

Case number:
Cout <++ g_line <": Number/N" <>
Break;

Case identifier:
Cout <++ g_line <": identifier/N" <>
Break;

Default:
Cout <++ g_line <": Error/N" <>
Break;
}
}

Int main ()
{
Print (MATCH ("if "));
Print (MATCH ("IFF "));
Print (MATCH ("if0 "));
Print (MATCH ("0if "));
Print (MATCH ("i0f "));
Print (MATCH ("IA "));
Print (MATCH ("01 "));
Print (matching ("123 "));
Print (MATCH ("1f "));
Print (MATCH ("ABCD "));
Print (MATCH ("AB "));
Print (MATCH (""));
Print (MATCH ("0 "));
Print (MATCH ("I "));
Print (MATCH ("_"));

Return 0;
}

Example 1: A simple DFA table driver matching Program

In the above example, the matching or classification of strings is completed through Finite Automation. The representation of Finite Automation in the Code is the two-dimensional array dfa_table. Each row of dfa_table (dfa_table [I]) represents the state of the finite automatic machine, and the column represents the State Transition (transfer) that can be executed from the current state ). For example, when matching, the program starts from dfa_table [0], that is, the starting state. If the first string is I, based on the Conversion rule specified by dfa_table [0] ['I'], jump to the next State (State), where the next state is 2, that is, the third row of dfa_table, then, the status to be converted is determined based on the next character of Str. The matching process repeats until all strings are processed. At this time, the program determines whether the current status is an acceptable status (acceptable state), that is, whether the status is defined in tokentype, if the status is defined in tokentype, then the string matching is successful. Otherwise ...... Boom.

In the for loop of the match function, I used the if judgment to select the correct index based on the current character. In fact, if you are not too troublesome, The for loop in your match function can be simplified as follows:

For (string: iterator iter = Str. Begin ();
ITER! = Str. End ();
++ ITER)
{
State = dfa_table [State] [* ITER];

}


The premise is that you are willing to extend the dfa_table into a 127*5 Two-dimensional table.

To be continued ......

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.