Full interpretation of regular expressions in c#.net

Source: Internet
Author: User
Tags contains expression regular expression tostring
Regular

For many years, many programming languages and tools have included support for regular expressions. NET base Class library contains a namespace and a series of classes that can give full play to the power of regular expressions, and they are all compatible with the regular expressions in future Perl 5.
   
In addition, the RegExp class can perform other functions, such as right-to-left binding patterns and expression editing.
  
In this article, I'll briefly describe the classes and methods in System.Text.RegularExpression, examples of string matching and substitution, and details of the group structure, and finally, some of the common expressions you might use.
  
Basic knowledge that should be mastered
The knowledge of regular expressions may be one of the many programmers who "often forget" knowledge. In this article, we will assume that you have mastered the use of regular expressions, especially the expressions in Perl 5. NET's RegExp class is a superset of the expressions in Perl 5, so theoretically it will be a good starting point. We also assume that you have the syntax of C # and. NET Architecture Basics.
  
If you have no knowledge of regular expressions, I suggest you start with the grammar of Perl 5. The authoritative book on regular expressions is written by Jeffrey Freder, a book of mastering expressions, and we strongly recommend reading this book to readers who want to understand the expression deeply.
  
RegularExpression Group
The RegExp rule class is contained in the
System.Text.RegularExpressions.dll file, you must refer to this file when compiling the application software, for example, CSC r:system.text.regularexpressions.dll The Foo.cs command creates a Foo.EXE file that references the System.Text.RegularExpressions file.
  
Name Space Introduction
The namespace contains only 6 classes and a definition, which are:
  
Capture: Contains the result of the first match;
  
A sequence of capturecollection:capture;
  
Group: The results of a set of records, inherited from capture;
  
Match: The result of an expression that is inherited by group;
  
A sequence of matchcollection:match;
  
MatchEvaluator: The agent used when performing the substitution operation;
  
Regex: An instance of an expression that is compiled.
  
The Regex class also contains some static methods:
  
Escape: Escapes the escape character in a regex in a string;
  
IsMatch: If an expression matches in a string, the method returns a Boolean value;
  
Match: Returns the example of match;
  
Matches: Returns a series of match methods;
  
Replace: Replaces a matching expression with a replacement string;
  
Split: Returns a series of strings determined by an expression;
  
Unescape: Escape characters in String are not escaped.
  
Simple match
Let's start with a simple expression that uses the Regex and the match class.
  
Match m = Regex.match ("Abracadabra", "(a|b|r) +");
  
We now have an instance of the match class that can be used for testing, for example: if (m.success) ...
  
If you want to use a matching string, you can convert it to a string:
  
Console.WriteLine ("Match=" +m.tostring ());
  
This example can get the following output: Match=abra. This is the matching string.
  
Substitution of strings
The substitution of simple strings is very intuitive. For example, the following statement:
  
string s = Regex.Replace
("Abracadabra", "Abra", "zzzz");
   
It returns the string zzzzcadzzzz, and all matching strings are replaced with zzzzz.
   
Now let's look at a more complex example of string substitution:
   
string s = Regex.Replace ("Abra", @ "^\s*" (. *?)
\s*$ "," $ ");
   
This statement returns the string Abra, with the leading and suffix spaces removed.
   
The above pattern is useful for removing leading and subsequent spaces in any string. In C #, we often use alphabetic strings, in an alphabetic string, the compiler does not treat the character "\" as an escape character. When you specify an escape character by using the character "\",
@ "..." is very useful. Also worth mentioning is the use of string substitution, which indicates that the replacement string can contain only the replaced string.
   
Match engine details
Now, we understand a slightly more complex example through a group structure. Look below.
Example:
   
String text
= "ABRACADABRA1ABRACADABRA2ABRACADABRA3";
   
String Pat = @ "
   
(# The beginning of the first group
   
Abra # Match String Abra
   
(# The start of the second group
   
CAD # Matching string CAD
   
)? # End of second group (optional)
   
) # End of first group
   
+ # match one or more times
   
";
   
Ignore annotations with X modifiers
   
Regex r = new Regex (PAT, "X");
   
Get a list of group numbers
   
int[] Gnums = R.getgroupnumbers ();
   
First match
   
Match m = r.match (text);
   
while (m.success)
   
{
   
Starting from Group 1
   
for (int i = 1; i < gnums. Length; i++)
   
{
   
Group g = M.group (Gnums[i]);
   
Get this matching group
   
Console.WriteLine ("Group" +gnums[i]+ "=[" +g.tostring () + "]");
   
Calculate the starting position and length of this group
   
capturecollection cc = g.captures;
   
for (int j = 0; J < cc.) Count; J + +)
   
{
   
Capture C = cc[j];
   
Console.WriteLine ("Capture" + j + "=[" +c.tostring ()
   
+ "] index=" + c.index + "length=" + c.length);
   
}
   
}
   
Next match
   
m = M.nextmatch ();
   
}
   
The output of this example is shown below:
   
Group1=[abra]
   
CAPTURE0=[ABRACAD] Index=0 length=7
   
Capture1=[abra] Index=7 length=4
   
GROUP2=[CAD]
   
CAPTURE0=[CAD] index=4 length=3
   
Group1=[abra]
   
CAPTURE0=[ABRACAD] index=12 length=7
   
Capture1=[abra] index=19 length=4
   
GROUP2=[CAD]
   
CAPTURE0=[CAD] Index=16 length=3
   
Group1=[abra]
   
CAPTURE0=[ABRACAD] index=24 length=7
   
Capture1=[abra] index=31 length=4
   
GROUP2=[CAD]
   
CAPTURE0=[CAD] index=28 length=3
   
We start with the test string pat, and the PAT contains an expression. The first capture begins with the first parenthesis, and then the expression matches to a abra. The second capture group starts with the second parenthesis, but the first capture group is not finished, which means that the result of the first group match is Abracad, and the second group matches only CAD. So what if you use it? Symbol to make CAD an optional match, the result may be Abra or ABRACAD. The first group is then terminated, and the expression is required to match multiple occurrences by specifying the + symbol.
   
Now let's take a look at what happened during the match. First, you create an instance of an expression by calling the Regex's constructor method and specify various options in it. In this example, because there is a comment in the expression, the X option is selected, and some spaces are used. With the x option open, the expression ignores the comment and the space in which there are no escapes.
   
Then, get a list of the numbers of the groups defined in the expression. You can of course use these numbers in a dominant way, using the programming method here. This is also useful as a way to establish a quick index if you use a named group.
   
The next step is to complete the first match. Test whether the current match is successful through a loop, and then repeat the action on the group list starting from Group 1. The reason for not using group 0 in this example is that group 0 is a perfectly matched string, and group 0 is used if you want to collect all the matching strings as a single string.
   
We track the capturecollection in each group. Typically, there can be only one capture per match, each group, but in this case Group1 has two capture:capture0 and Capture1. If you only need Group1 tostring, you will only get Abra, and of course it will match Abracad. The value of ToString in a group is the value of the last capture in its capturecollection, which is exactly what we need. If you want the entire process to end after matching Abra, you should remove the + symbol from the expression and let the Regex engine know that we just need to match the expression.
   
Comparison based on process and expression methods
Under normal circumstances, users who use regular expressions can be grouped into the following two categories: The first category uses the procedure to perform some repetitive actions rather than using regular expressions, while the second type uses the process as little as possible with the functionality and power of the regular expression processing engine.
   
For most of our users, the best solution is to use both. I hope this article will explain. NET language in RegExp



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.