Author: Liu Yanqing
For yearsProgramming LanguageAnd tools support regular expressions ,. net base class library contains a namespace and a series of classes that can fully utilize the power of Rule expressions, and they are also compatible with future rule expressions in Perl 5.
In addition, the Regexp class can complete some other functions, such as the combination mode from right to left and expression editing.
In this articleArticle. Text. classes and methods in regularexpression, examples of string matching and replacement, and detailed information about the group structure. Finally, we will introduce some common expressions that you may use.
Basic knowledge to be mastered
The knowledge of Rule expressions may be one of the things that many programmers often forget. In this article, we will assume that you have mastered the usage of regular expressions, especially the expressions in Perl 5 .. Net Regexp class is a superset of expressions in Perl 5, So theoretically it will be a good start point. We also assume that you have basic knowledge of C # syntax and. NET architecture.
If you do not have knowledge about Rule expressions, I suggest you start with the Perl 5 syntax. The authoritative book on Rule expressions is written by Jeffrey fredel. We strongly recommend that you read this book for readers who wish to have a deep understanding of expressions.
Regularexpression combination
The Regexp rule class is included in the system. Text. regularexpressions. dll file. You must reference this file when compiling the application software. For example:
Csc r: system. Text. regularexpressions. dll Foo. CS
The command will create the foo.exe file, which references the system. Text. regularexpressions file.
Namespace Introduction
The namespace contains only six classes and one definition. They are:
Capture: contains a matching result;
Capturecollection: the sequence of capture;
Group: the result of a group record, inherited by capture;
Match: the matching result of an expression, inherited by the Group;
Matchcollection: a sequence of match;
Matchevaluator: the proxy used to perform the replacement operation;
RegEx: An Example of the compiled expression.
The RegEx class also contains some static methods:
Escape: escape the escape characters in the RegEx string;
Ismatch: If the expression matches a string, this method returns a Boolean value;
Match: returns the instance of the match;
Matches: returns a series of match methods;
Replace: Replace the matching expression with the replacement string;
Split: returns a series of strings determined by expressions;
Unescape: do not escape characters in strings.
Simple Matching
First, we start to learn from simple expressions of the RegEx and match classes.
Match m = RegEx. Match ("abracadabra", "(a | B | r) + ");
We now have an instance of the match class that can be used for testing, for example: If (M. Success )...
If you want to use a matched string, you can convert it into a string:
Console. writeline ("match =" + M. tostring ());
In this example, the following output is obtained: match = Abra. This is the matched string.
String replacement
Simple string replacement is very intuitive. For example, the following statement:
String S = RegEx. Replace ("abracadabra", "Abra", "ZZZZ ");
It returns the string zzzzzzcadzzzz, and all matched strings are replaced with zzzzzzz.
Now let's look at a complicated string replacement example:
String S = RegEx. Replace ("Abra", @ "^ \ s *(.*?) \ S * $ "," $1 ");
This statement returns the string Abra, with leading and trailing spaces removed.
The preceding mode is useful for deleting leading and trailing spaces in any string. In C #, we often use letter strings. In a letter string, compileProgramThe character "\" is not treated as an escape character. When the character "\" is used to specify the Escape Character, @ "..." is very useful. It is also worth mentioning that $1 is used in string replacement, which indicates that the replacement string can only contain the replaced string.
Matching engine details
Now, we use a group structure to understand a slightly complex example. See the following example:
String text = "abracadabra1abracadabra2abracadabra3 ";
String PAT = @"
(# Start of the first group
Abra # match the string Abra
(# Start of the second group
CAD # matching string CAD
)? # End of the second group (optional)
) # End of the first group
+ # Match once or multiple times
";
// Ignore comments using the x modifier
RegEx r = new RegEx (Pat, "x ");
// Obtain the group number list
Int [] gnums = R. getgroupnumbers ();
// Match for the first time
Match m = R. Match (text );
While (M. Success)
{
// Start with Group 1
For (INT I = 1; I <gnums. length; I ++)
{
Group G = M. Group (gnums [I]);
// Obtain the matched group
Console. writeline ("group" + gnums [I] + "= [" + G. tostring () + "]");
// Calculate the start position and length of the Group
Capturecollection cc = G. captures;
For (Int J = 0; j <cc. Count; j ++)
{
Capture c = Cc [J];
Console. writeline ("capture" + J + "= [" + C. tostring ()
+ "] Index =" + C. index + "length =" + C. Length );
}
}
// Next match
M = M. nextmatch ();
}
The output of this example is as follows:
Group1 = [Abra]
Capture0 = [abracad] Index = 0 length = 7
Capture1 = [Abra] Index = 7 length = 4
Group2 = [CAD]
Capture0 = [CAD] Index = 4 length = 3
Group1 = [Abra]
Capture0 = [abracad] Index = 12 length = 7
Capture1 = [Abra] Index = 19 length = 4
Group2 = [CAD]
Capture0 = [CAD] Index = 16 length = 3
Group1 = [Abra]
Capture0 = [abracad] Index = 24 length = 7
Capture1 = [Abra] Index = 31 length = 4
Group2 = [CAD]
Capture0 = [CAD] Index = 28 length = 3
we first start by examining the string Pat, which contains an expression. The first capture starts with the first parentheses, and then the expression matches with an Abra. The second capture group starts from the second parentheses, but the first capture group is not over yet. This means that the first group matches abracad, the matching result of the second group is only CAD. Therefore, if you use? Make CAD an optional match, and the matching result is