Regular Expressions provide a powerful, flexible, and efficient way to process text. The full pattern matching expression of a regular expression can quickly analyze a large number of texts to find a specific character pattern; extract, edit, replace or delete the character string; or add the extracted string to the set to generate the report. Regular Expressions are an indispensable tool for many applications that process strings (such as HTML processing, Log File Analysis, and HTTP header analysis.
. Net Framework regular expressions are incorporated into the most common functions implemented by other regular expressions and are designed to be compatible with Perl 5 regular expressions ,. net Framework regular expressions also include some functions that are not provided in other implementations ,. the. NET Framework regular expression class is part of the base class library and can be used together with any language or tool for the public Language Runtime Library.
2. string SEARCH
The Regular Expression Language consists of two basic character types: literal (normal) text characters and metacharacters. It is precisely the metacharacters that provide processing capabilities for regular expressions. Currently, all text editors have some search functions. You can open a dialog box and type the string to be located in a text box. If you want to replace the string at the same time, you can enter a replacement string. For example, this function is available in notepad in the Windows operating system and the document editor in the office series. This is the simplest way to search. This type of problem can be easily solved using the string. Replace () method of the string class. But what if you need to identify a duplicate in the document? It is complicated to write a routine and select repeated words from a string class. In this case, the language is suitable.
A general expression language is a language that allows you to write search expressions. In this language, you can combine the text to be searched, escape sequences, and other characters with specific meanings in the document, for example, sequence \ B indicates the beginning and end of a word (subboundary). If you want to represent the word that is being searched for starting with the character th, you can write a general expression \ BTH (that is, the sequence character field is-t-h ). If you want to search for all words ending with th, you can write th \ B (sequence T-h-word boundary ). However, general expressions are much more complex than this. For example, you can find the tool Program (facility) that stores some text in a search operation ).
3. Regular Expression class of the. NET Framework
Next we will introduce the regular expression class of the. NET Framework to familiarize ourselves with the use of regular expressions in the. NET Framework.
3.1 RegEx class indicates read-only regular expression
The RegEx class contains various static methods that allow other regular expression classes to be used without explicitly instantiating objects of other classes. The following code example creates an instance of the RegEx class and defines a simple regular expression when initializing an object. Note that an additional backslash is used as the escape character, which specifies the backslash in the \ s matching character class as the original literal character.
RegEx R; // declare a variable of the RegEx class R = new RegEx ("\ S2000"); // defines the expression |
3.2 match class indicates the result of regular expression matching operation
The following example uses the match method of the RegEx class to return a match object to find the first match in the input string. This example uses the match. Success attribute of the match class to indicate whether a match has been found.
RegEx r = new RegEx ("ABC"); // defines a RegEx object instance Match m = R. Match ("123abc456"); // match the string If (M. Success) { Console. writeline ("found match at position" + M. Index); // enter the position of the matching character } |
3.3 matchcollection class indicates a non-overlapping matching Sequence
This set is read-only and does not have a public constructor. The matchcollection instance is returned by the RegEx. Matches attribute. Use the matches method of the RegEx class to fill the matchcollection with all the matches found in the input string. The following code example demonstrates how to copy a set to a string array (retain each match) and an integer array (indicating each matched position.
Matchcollection MC; String [] Results = new string [20]; Int [] matchposition = new int [20]; RegEx r = new RegEx ("ABC"); // defines a RegEx object instance MC = R. Matches ("123abc4abcd "); For (INT I = 0; I <MC. Count; I ++) // find all matches in the input string { Results [I] = mc [I]. value; // Add the matched string to the string array. Matchposition [I] = mc [I]. Index; // record the position of matching characters } |
3.4 groupcollection class indicates the set of captured groups
This set is read-only and does not have a public constructor. Groupcollection instances are returned in the Set returned by the match. Groups attribute. The following console application finds and outputs the number of groups captured by regular expressions.
Using system; Using system. Text. regularexpressions; Public class regextest { Public static void runtest () { RegEx r = new RegEx ("(A (B) c"); // define a group Match m = R. Match ("abdabc "); Console. writeline ("number of groups found =" + M. Groups. Count ); } Public static void main () { Runtest (); } } |
This example generates the following output:
Number of groups found = 3 |
3.5 capturecollection class indicates the sequence of captured substrings
Because of the qualifier, the capture group can capture multiple strings in a single match. The captures attribute (the object of the capturecollection class) is provided as a member of the match and group classes to facilitate access to the collection of captured substrings. For example, if you use a regular expression (A (B) c) + (the + qualifier specifies one or more matches) to capture matches from the string "abcabcabc, the capturecollection of each matched group of the substring contains three members.
The following program uses the regular expression (ABC) + to find one or more matches in the string "xyzabcabcabcxyzabcab". It illustrates how to use the captures attribute to return multiple groups of captured substrings.
Using system; Using system. Text. regularexpressions; Public class regextest { Public static void runtest () { Int counter; Match m; Capturecollection cc; Groupcollection GC; RegEx r = new RegEx ("(ABC) +"); // search for "ABC" M = R. Match ("xyzabcabcabcxyzabcab"); // you can specify the string to be searched. GC = M. Groups; // Output the number of search groups Console. writeline ("captured groups =" + GC. Count. tostring ()); // Loop through each group. For (INT I = 0; I <GC. Count; I ++) // find each group { Cc = GC [I]. captures; Counter = cc. count; Console. writeline ("captures COUNT =" + counter. tostring ()); For (int ii = 0; II <counter; II ++) { // Print capture and position. Console. writeline (CC [II] + "starts at character" + CC [II]. Index); // enter the capture position } } } Public static void main (){ Runtest (); } } |
In this example, the following output result is returned:
Captured groups = 2 Captures COUNT = 1 Abcabcabc starts at character 3 Captures COUNT = 3 ABC starts at character 3 ABC starts at character 6 ABC starts at character 9 |
3.6 The capture class contains results captured by a single subexpression
Loop in the group set, extract the capture set from each member of the group, and assign the variable posn and length to locate the character position in the initial string of each string, respectively, and the length of each string.
RegEx R; Match m; Capturecollection cc; Int posn, length; R = new RegEx ("(ABC )*"); M = R. Match ("bcabcabc "); For (INT I = 0; M. Groups [I]. value! = ""; I ++) { Cc = M. Groups [I]. captures; For (Int J = 0; j <cc. Count; j ++) { Posn = Cc [J]. Index; // capture Object Location Length = Cc [J]. length; // capture object Length } } |
After combining the composite characters, a group object is returned every time, which may not be the expected result. If you want to use composite characters as part of the search mode, there will be considerable system overhead. A single group can be used as a character sequence "? : "Groups starting with" Do not do this, as in the URI example. For all groups, you can specify the regexoptions. explicitcapture flag on the RegEx. Matches () method.