Interpreting regular Expressions in C #

Source: Internet
Author: User
Tags foreach lowercase regular expression tostring
For many years now, many programming languages and tools have included support for regular expressions. NET base Class library contains a namespace and a series of classes that can give full play to the power of regular expressions, and they are all compatible with the regular expressions in future Perl 5.

In addition, the RegExp class can perform other functions, such as right-to-left binding patterns and expression editing.

In this article, I'll briefly describe the classes and methods in System.Text.RegularExpression, examples of string matching and substitution, and details of the group structure, and finally, some of the common expressions you might use.

Basic knowledge that should be mastered

The knowledge of regular expressions may be one of the many programmers who "often forget" knowledge. In this article, we will assume that you have mastered the use of regular expressions, especially the expressions in Perl 5. NET's RegExp class is a superset of the expressions in Perl 5, so theoretically it will be a good starting point. We also assume that you have the syntax of C # and. NET Architecture Basics.

If you have no knowledge of regular expressions, I suggest you start with the grammar of Perl 5. The authoritative book on regular expressions is written by Jeffrey Freder, a book of mastering expressions, and we strongly recommend reading this book to readers who want to understand the expression deeply.

RegularExpression Group

The RegExp rule class is contained in the System.Text.RegularExpressions.dll file, and you must refer to the file when compiling the application, for example:

csc R:system.text.regularexpressions.dll Foo.cs

command to create the Foo.EXE file, it references the System.Text.RegularExpressions file.

Name Space Introduction

The namespace contains only 6 classes and a definition, which are:

Capture: Contains the result of the first match;
A sequence of capturecollection:capture;
Group: The results of a set of records, inherited from capture;
Match: The result of an expression that is inherited by group;
A sequence of matchcollection:match;
MatchEvaluator: The agent used when performing the substitution operation;
Regex: An instance of an expression that is compiled.

The Regex class also contains some static methods:

Escape: Escapes the escape character in a regex in a string;
IsMatch: If an expression matches in a string, the method returns a Boolean value;
Match: Returns the example of match;
Matches: Returns a series of match methods;
Replace: Replaces a matching expression with a replacement string;
Split: Returns a series of strings determined by an expression;
Unescape: Escape characters in String are not escaped.

Simple match

Let's start with a simple expression that uses the Regex and the match class.

Match m = Regex.match ("Abracadabra", "(a|b|r) +");

We now have an instance of the match class that can be used for testing, for example: if (m.success) ...
If you want to use a matching string, you can convert it to a string:

Console.WriteLine ("Match=" +m.tostring ());

This example can get the following output: Match=abra. This is the matching string.

Substitution of strings

The substitution of simple strings is very intuitive. For example, the following statement:

string s = Regex.Replace ("Abracadabra", "Abra", "zzzz");

It returns the string zzzzcadzzzz, and all matching strings are replaced with zzzzz.

Now let's look at a more complex example of string substitution:

string s = Regex.Replace ("Abra", @ "^\s*" (. *?) \s*$ "," $ ");

This statement returns the string Abra, with the leading and suffix spaces removed.

The above pattern is useful for removing leading and subsequent spaces in any string. In C #, we often use alphabetic strings, in an alphabetic string, the compiler does not treat the character "\" as an escape character. @ "..." is useful when you specify an escape character by using the character "\". Also worth mentioning is the use of string substitution, which indicates that the replacement string can contain only the replaced string.

Match engine details

Now, we understand a slightly more complex example through a group structure. Look at the following example:

String text = "Abracadabra1abracadabra2abracadabra3";

String Pat = @ "

(# The beginning of the first group

Abra # Match String Abra

(# The start of the second group

CAD # Matching string CAD

)? # End of second group (optional)

) # End of first group

+ # match one or more times

";

Ignore annotations with X modifiers

Regex r = new Regex (PAT, "X");

Get a list of group numbers

int[] Gnums = R.getgroupnumbers ();

First match

Match m = r.match (text);

while (m.success)

{

Starting from Group 1

for (int i = 1; i < gnums. Length; i++)

{

Group g = M.group (Gnums[i]);

Get this matching group

Console.WriteLine ("Group" +gnums[i]+ "=[" +g.tostring () + "]");

Calculate the starting position and length of this group

capturecollection cc = g.captures;

for (int j = 0; J < cc.) Count; J + +)

{

Capture C = cc[j];

Console.WriteLine ("Capture" + j + "=[" +c.tostring ()

+ "] index=" + c.index + "length=" + c.length);

}

}

Next match

m = M.nextmatch ();

}

The output of this example is shown below:
     
Group1=[abra]

CAPTURE0=[ABRACAD] Index=0 length=7

Capture1=[abra] Index=7 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] index=4 length=3

Group1=[abra]

CAPTURE0=[ABRACAD] index=12 length=7

Capture1=[abra] index=19 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] Index=16 length=3

Group1=[abra]

CAPTURE0=[ABRACAD] index=24 length=7

Capture1=[abra] index=31 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] index=28 length=3

We start with the test string pat, and the PAT contains an expression. The first capture begins with the first parenthesis, and then the expression matches to a abra. The second capture group starts with the second parenthesis, but the first capture group is not finished, which means that the result of the first group match is Abracad, and the second group matches only CAD. So what if you use it? Symbol to make CAD an optional match, the result may be Abra or ABRACAD. The first group is then terminated, and the expression is required to match multiple occurrences by specifying the + symbol.

Now let's take a look at what happened during the match. First, you create an instance of an expression by calling the Regex's constructor method and specify various options in it. In this example, because there is a comment in the expression, the X option is selected, and some spaces are used. With the x option open, the expression ignores the comment and the space in which there are no escapes.

Then, get a list of the numbers of the groups defined in the expression. You can of course use these numbers in a dominant way, using the programming method here. This is also useful as a way to establish a quick index if you use a named group.

The next step is to complete the first match. Test whether the current match is successful through a loop, and then repeat the action on the group list starting from Group 1. The reason for not using group 0 in this example is that group 0 is a perfectly matched string, and group 0 is used if you want to collect all the matching strings as a single string.

We track the capturecollection in each group. Typically, there can be only one capture per match, each group, but in this case Group1 has two capture:capture0 and Capture1. If you only need Group1 tostring, you will only get Abra, and of course it will match Abracad. The value of ToString in a group is the value of the last capture in its capturecollection, which is exactly what we need. If you want the entire process to end after matching Abra, you should remove the + symbol from the expression and let the Regex engine know that we just need to match the expression.

Comparison based on process and expression methods

Under normal circumstances, users who use regular expressions can be grouped into the following two categories: The first category uses the procedure to perform some repetitive actions rather than using regular expressions, while the second type uses the process as little as possible with the functionality and power of the regular expression processing engine.

For most of our users, the best solution is to use both. I hope this article will explain. NET language, the role of the RegExp class and its advantages and disadvantages between performance and complexity.

Process-based patterns

One of the features that we often need to use in programming is to match a part of a string or some other string processing, and here is an example of a match to a word in a string:

String text = "The quick red fox jumped over the lazy brown dog."

System.Console.WriteLine ("text=[" + text + "]");

string result = "";

String pattern = @ "\w+|\w+";

foreach (Match m in regex.matches (text, pattern))

{

Get a matching string

string x = M.tostring ();

If the first character is lowercase

if (char. Islower (X[0])

into uppercase

x = char. ToUpper (x[0]) + x.substring (1, x.length-1);

Collect all the characters

result = x;

}

System.Console.WriteLine ("result=[" + result + "]");

As shown in the example above, we used the foreach statement in the C # language to process each matching character and complete the corresponding processing, in which case a new result string was created. The output of this example is as follows:

Text=[the Quick red fox jumped over the lazy brown dog.]

Result=[the Quick Red Fox jumped over the Lazy Brown Dog.]

Patterns based on expressions

Another way to complete the functionality in the previous example is through a matchevaluator, and the new code looks like this:

static string Captext (Match m)

{

Get a matching string

string x = M.tostring ();

If the first character is lowercase

if (char. Islower (X[0])

Convert to uppercase

return char. ToUpper (x[0]) + x.substring (1, x.length-1);

return x;

}

    

static void Main ()

{

String text = "The quick red fox jumped over the

Lazy Brown Dog. ";

System.Console.WriteLine ("text=[" + text + "]");

String pattern = @ "\w+";

string result = Regex.Replace (text, pattern,

New MatchEvaluator (Test.captext));

System.Console.WriteLine ("result=[" + result + "]");

}

It's also important to note that the pattern is simple because you need to modify the word without having to modify it.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.