For many years now, many programming languages and tools have included support for regular expressions. NET base Class library contains a namespace and a series of classes that can give full play to the power of regular expressions, and they are all compatible with the regular expressions in future Perl 5.
In addition, the RegExp class can perform other functions, such as right-to-left binding patterns and expression editing.
In this article, I'll briefly describe the classes and methods in System.Text.RegularExpression, examples of string matching and substitution, and details of the group structure, and finally, some of the common expressions you might use.
Basic knowledge that should be mastered
The knowledge of regular expressions may be one of the many programmers who "often forget" knowledge. In this article, we will assume that you have mastered the use of regular expressions, especially the expressions in Perl 5. NET's RegExp class is a superset of the expressions in Perl 5, so theoretically it will be a good starting point. We also assume that you have the syntax of C # and. NET Architecture Basics.
If you have no knowledge of regular expressions, I suggest you start with the grammar of Perl 5. The authoritative book on regular expressions is written by Jeffrey Freder, a book of mastering expressions, and we strongly recommend reading this book to readers who want to understand the expression deeply.
RegularExpression Group
The RegExp rule class is contained in the System.Text.RegularExpressions.dll file, and you must refer to the file when compiling the application, for example:
csc R:system.text.regularexpressions.dll Foo.cs
command to create the Foo.EXE file, it references the System.Text.RegularExpressions file.
Name Space Introduction
The namespace contains only 6 classes and a definition, which are:
Capture: Contains the result of the first match;
A sequence of capturecollection:capture;
Group: The results of a set of records, inherited from capture;
Match: The result of an expression that is inherited by group;
A sequence of matchcollection:match;
MatchEvaluator: The agent used when performing the substitution operation;
Regex: An instance of an expression that is compiled.
The Regex class also contains some static methods:
Escape: Escapes the escape character in a regex in a string;
IsMatch: If an expression matches in a string, the method returns a Boolean value;
Match: Returns the example of match;
Matches: Returns a series of match methods;
Replace: Replaces a matching expression with a replacement string;
Split: Returns a series of strings determined by an expression;
Unescape: Escape characters in String are not escaped.
Simple match
Let's start with a simple expression that uses the Regex and the match class.
Match m = Regex.match ("Abracadabra", "(a|b|r) +");
We now have an instance of the match class that can be used for testing, for example: if (m.success) ...
If you want to use a matching string, you can convert it to a string:
Console.WriteLine ("Match=" +m.tostring ());
This example can get the following output: Match=abra. This is the matching string.
Substitution of strings
The substitution of simple strings is very intuitive. For example, the following statement:
string s = Regex.Replace ("Abracadabra", "Abra", "zzzz");
It returns the string zzzzcadzzzz, and all matching strings are replaced with zzzzz.
Now let's look at a more complex example of string substitution:
This statement returns the string Abra, with the leading and suffix spaces removed.
The above pattern is useful for removing leading and subsequent spaces in any string. In C #, we often use alphabetic strings, in an alphabetic string, the compiler does not treat the character "\" as an escape character. @ "..." is useful when you specify an escape character by using the character "\". Also worth mentioning is the use of string substitution, which indicates that the replacement string can contain only the replaced string.
Match engine details
Now, we understand a slightly more complex example through a group structure. Look at the following example:
String text = "Abracadabra1abracadabra2abracadabra3";
We start with the test string pat, and the PAT contains an expression. The first capture begins with the first parenthesis, and then the expression matches to a abra. The second capture group starts with the second parenthesis, but the first capture group is not finished, which means that the result of the first group match is Abracad, and the second group matches only CAD. So what if you use it? Symbol to make CAD an optional match, the result may be Abra or ABRACAD. The first group is then terminated, and the expression is required to match multiple occurrences by specifying the + symbol.
Now let's take a look at what happened during the match. First, you create an instance of an expression by calling the Regex's constructor method and specify various options in it. In this example, because there is a comment in the expression, the X option is selected, and some spaces are used. With the x option open, the expression ignores the comment and the space in which there are no escapes.
Then, get a list of the numbers of the groups defined in the expression. You can of course use these numbers in a dominant way, using the programming method here. This is also useful as a way to establish a quick index if you use a named group.
The next step is to complete the first match. Test whether the current match is successful through a loop, and then repeat the action on the group list starting from Group 1. The reason for not using group 0 in this example is that group 0 is a perfectly matched string, and group 0 is used if you want to collect all the matching strings as a single string.
We track the capturecollection in each group. Typically, there can be only one capture per match, each group, but in this case Group1 has two capture:capture0 and Capture1. If you only need Group1 tostring, you will only get Abra, and of course it will match Abracad. The value of ToString in a group is the value of the last capture in its capturecollection, which is exactly what we need. If you want the entire process to end after matching Abra, you should remove the + symbol from the expression and let the Regex engine know that we just need to match the expression.
Comparison based on process and expression methods
Under normal circumstances, users who use regular expressions can be grouped into the following two categories: The first category uses the procedure to perform some repetitive actions rather than using regular expressions, while the second type uses the process as little as possible with the functionality and power of the regular expression processing engine.
For most of our users, the best solution is to use both. I hope this article will explain. NET language, the role of the RegExp class and its advantages and disadvantages between performance and complexity.
Process-based patterns
One of the features that we often need to use in programming is to match a part of a string or some other string processing, and here is an example of a match to a word in a string:
String text = "The quick red fox jumped over the lazy brown dog."
System.Console.WriteLine ("text=[" + text + "]");
string result = "";
String pattern = @ "\w+|\w+";
foreach (Match m in regex.matches (text, pattern))
{
Get a matching string
string x = M.tostring ();
If the first character is lowercase
if (char. Islower (X[0])
into uppercase
x = char. ToUpper (x[0]) + x.substring (1, x.length-1);
Collect all the characters
result = x;
}
System.Console.WriteLine ("result=[" + result + "]");
As shown in the example above, we used the foreach statement in the C # language to process each matching character and complete the corresponding processing, in which case a new result string was created. The output of this example is as follows:
Text=[the Quick red fox jumped over the lazy brown dog.]
Result=[the Quick Red Fox jumped over the Lazy Brown Dog.]
Patterns based on expressions
Another way to complete the functionality in the previous example is through a matchevaluator, and the new code looks like this:
System.Console.WriteLine ("result=[" + result + "]");
}
It's also important to note that the pattern is simple because you need to modify the word without having to modify it.
Common expressions
In order to better understand how to use regular expressions in a C # environment, I write some rule expressions that might be useful to you, and these expressions are used in other environments, hoping to help you.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.