C # The application of the expression in replace!

Source: Internet
Author: User

Many programming languages and tools include support for regular expressions for many years. The net base Class library contains a namespace and a series of classes that give full play to the power of regular expressions, and they are also compatible with the regular expressions in future Perl 5.

In addition, the RegExp class is able to perform some other functions, such as right-to-left associative mode and expression editing.

In this article, I'll briefly describe the classes and methods in System.Text.RegularExPRession, examples of string matching and substitution, and the details of the group structure, and finally, some common expressions you might use.

Basic knowledge to be mastered

The knowledge of regular expressions may be one of the many programmers who "often forget" knowledge. In this article, we will assume that you have mastered the use of regular expressions, especially in Perl 5. NET's RegExp class is a superset of the expression in Perl 5, so theoretically it will be a good starting point. We also assume that you have the syntax of C # and. The basics of the net architecture.

If you don't have knowledge of regular expressions, I suggest you start with the syntax of Perl 5. The authoritative book on rule expressions is the book "Mastering Expressions" written by Jeffrey Freder, and we strongly recommend reading this book for readers who want to understand the expression in depth.

RegularExpression Group

The RegExp rule class is included in the System.Text.RegularExpressions.dll file, and you must refer to the file when compiling the application, for example:

csc R:system.text.regularexpressions.dll Foo.cs

The command creates the Foo.EXE file, which references the System.Text.RegularExpressions file.

Introduction to namespaces

The namespace contains only 6 classes and a definition, which are:

Capture: Contains the result of a match;
sequence of capturecollection:capture;
Group: The result of a set of records, which is inherited from capture;
Match: The matching result of an expression is inherited by group;
A sequence of matchcollection:match;
MatchEvaluator: The agent used when performing the replace operation;
Regex: An instance of the compiled expression.

The Regex class also contains some static methods:

Escape: Escapes the escape character in a regex in a string;
IsMatch: If an expression matches in a string, the method returns a Boolean value;
Match: Returns an instance of match;
Matches: Returns a series of match methods;
Replace: Replaces the matching expression with a replacement string;
Split: Returns a series of strings determined by an expression;
Unescape: Escape character in string is not escaped.

Simple match

Let's start with a simple expression that uses the Regex and match classes.

Match m = Regex.match ("Abracadabra", "(a|b|r) +");

We now have an instance of the match class that can be used for testing, for example: if (m.success) ...
If you want to use a matching string, you can convert it to a string:

Console.WriteLine ("Match=" +m.tostring ());

This example gives the following output: Match=abra. This is the matching string.

Substitution of strings

The substitution of simple strings is straightforward. For example, the following statement:

string s = Regex.Replace ("Abracadabra", "Abra", "zzzz");

It returns the string zzzzcadzzzz, and all matching strings are replaced with zzzzz.

Now let's look at a more complex example of string substitution:

string s = Regex.Replace ("Abra", @ "^\s*" (. *?) \s*$ "," $ ");

This statement returns the string Abra, with the leading and trailing spaces removed.

The above pattern is useful for deleting leading and trailing spaces in any string. In C #, we also often use alphabetic strings, where the compiler does not treat the character "\" as an escape character in an alphabetic string. When you use the character "\" To specify an escape character, the @ "..." is very useful. Also worth mentioning is the use of the string substitution, which indicates that the replacement string can only contain the substituted string.

Details of the matching engine

Now, let's understand a slightly more complex example through a group structure. Look at the following example:

String text = "Abracadabra1abracadabra2abracadabra3";

String Pat = @ "

(# The beginning of the first group

Abra # Match String Abra

(# The start of a second group

CAD # Match String CAD

)? # End of second group (optional)

) # End of first group

+ # matches one or more times

";

Ignore annotations with the X modifier

Regex r = new Regex (PAT, "X");

Get a list of group numbers

int[] Gnums = R.getgroupnumbers ();

First match

Match m = r.match (text);

while (m.success)

{

Starting from Group 1

for (int i = 1; i < gnums. Length; i++)

{

Group g = M.group (Gnums[i]);

Get this matching group

Console.WriteLine ("Group" +gnums[i]+ "=[" +g.tostring () + "]");

Calculate the starting position and length of this group

capturecollection cc = g.captures;

for (int j = 0; J < cc. Count; J + +)

{

Capture C = cc[j];

Console.WriteLine ("Capture" + j + "=[" +c.tostring ()

+ "] index=" + c.index + "length=" + c.length);

}

}

Next match

m = M.nextmatch ();

}

The output of this example is as follows:
     
Group1=[abra]

CAPTURE0=[ABRACAD] Index=0 length=7

Capture1=[abra] Index=7 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] index=4 length=3

Group1=[abra]

CAPTURE0=[ABRACAD] index=12 length=7

Capture1=[abra] index=19 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] Index=16 length=3

Group1=[abra]

CAPTURE0=[ABRACAD] index=24 length=7

Capture1=[abra] index=31 length=4

GROUP2=[CAD]

CAPTURE0=[CAD] index=28 length=3

We begin by examining the string pat, which contains an expression. The first capture starts with the first parenthesis, and then the expression is matched to a abra. The second capture group starts with the second parenthesis, but the first capture group is not finished, which means that the first group match results are Abracad, and the second group matches only CAD. So what if I use it? Symbol to make CAD an optional match, the result of the match may be Abra or ABRACAD. Then, the first group ends, and the expression is required to match multiple times by specifying the + symbol.

Now let's take a look at what happens in the matching process. First, an instance of the expression is established by calling the constructor method of the Regex and specifying various options in it. In this example, because there are comments in the expression, the X option is selected, and some spaces are used. When the x option is turned on, the expression ignores comments and spaces that are not escaped.

Then, get a list of the numbers of the groups defined in the expression. You can certainly use these numbers in a dominant way, and the programming method is used here. If a named group is used, this approach is also effective as a way to establish a fast index.

The next step is to complete the first match. Testing the success of the current match through a loop is followed by repeating the group listing from the start of the team 1. The reason for not using group 0 in this example is that group 0 is an exact string, and if you want to collect all the matching strings as a single string, you will use group 0.

We track the capturecollection in each group. Typically, there is only one capture per group, but Group1 in this example has two capture:capture0 and Capture1. If you only need Group1 tostring, you will only get Abra, and of course it will match Abracad. The value of ToString in the group is the last capture value in its capturecollection, which is exactly what we need. If you want the entire process to end after matching Abra, you should remove the + symbol from the expression and let the Regex engine know that we only need to match the expression.

Comparison of process-based and expression-based methods

In general, users who use regular expressions can be divided into the following two categories: the first class of users try not to use regular expressions, but instead use procedures to perform some operations that require duplication, while the second class takes advantage of the functionality and power of the regular expression processing engine and uses as few procedures as possible.

For most of our users, the best solution is to use both. I hope this article can explain. The role of the RegExp class in the net language and its superior and inferior points between performance and complexity.

Process-based patterns

One of the things we often need to do in programming is to match a part of a string or some other string processing, and here's an example that matches a word in a string:

String text = "The quick red fox jumped over the lazy brown dog.";

System.Console.WriteLine ("text=[" + text + "]");

string result = "";

String pattern = @ "\w+|\w+";

foreach (Match m in regex.matches (text, pattern))

{

Get a matching string

string x = M.tostring ();

If the first character is lowercase

if (char. Islower (X[0]))

into uppercase

x = char. ToUpper (x[0]) + x.substring (1, x.length-1);

Collect all the characters

result + = x;

}

System.Console.WriteLine ("result=[" + result + "]");

As shown in the example above, we use the foreach statement in the C # language to process each matching character and do the appropriate processing, in this case, a new result string is created. The output of this example is shown below:

Text=[the Quick red fox jumped over the lazy brown dog.]

Result=[the Quick Red Fox Jumped over the Lazy Brown Dog.]

Expression-based patterns

Another way to complete the functionality in the previous example is through a matchevaluator, and the new code looks like this:

static string Captext (Match m)

{

Get a matching string

string x = M.tostring ();

If the first character is lowercase

if (char. Islower (X[0]))

Convert to uppercase

return char. ToUpper (x[0]) + x.substring (1, x.length-1);

return x;

}



static void Main ()

{

String text = "The quick red fox jumped over the

Lazy Brown Dog. ";

System.Console.WriteLine ("text=[" + text + "]");

String pattern = @ "\w+";

string result = Regex.Replace (text, pattern,

New MatchEvaluator (Test.captext));

System.Console.WriteLine ("result=[" + result + "]");

}

It is also important to note that this pattern is very simple because only the words need to be modified without modifying the non-words.

Common expressions

To be able to better understand how to use rule expressions in a C # environment, I write some regular expressions that might be useful to you, which are used in other contexts and hopefully help you.

Roman numerals

string P1 = "^m* (D?C{0,3}|C[DM])" + "(L?X{0,3}|X[LC]) (V?I{0,3}|I[VX]) $";

string T1 = "VII";

Match m1 = Regex.match (T1, p1);

Swap the first two words

String t2 = "The quick brown fox";

string P2 = @ "(\s+) (\s+) (\s+)";

Regex x2 = new Regex (p2);

string r2 = x2. Replace (T2, "$3$2$1", 1);

Key word = value

string t3 = "Myval = 3";

String P3 = @ "(\w+) \s*=\s* (. *) \s*$";

Match m3 = regex.match (t3, p3);

Implementation of 80 characters per line

string t4 = "********************"

+ "******************************"

+ "******************************";

string P4 = ". {80,} ";

Match M4 = Regex.match (T4, p4);

Month/day/year hour: minutes: Time format for seconds

String T5 = "01/01/01 16:10:01";

string P5 = @ "(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)";

Match M5 = Regex.match (T5, p5);

Change directory (for Windows platforms only)

string T6 = @ "C:\Documents and settings\user1\desktop\";

string r6 = Regex.Replace (t6,@ "\\user1\\", @ "\\user2\\");

Extended 16-bit escape character

String t7 = "%41"; Capital A

String P7 = "% ([0-9a-fa-f][0-9a-fa-f])";

String R7 = Regex.Replace (T7, P7, Hexconvert);

Delete comments in the C language (needed to be perfected)

String T8 = @ "&NBSP;
   
 /* 
   
  * Traditional style comment  
     
  */ 
   
 ;  
   
  String P8 = @ "&NBSP;
   
 /\* # match comment start delimiter  
   
 . *? # match Comment  
   
  \*/# match comment end delimiter  
   
  ";  
 & NBSP;&NBSP
  String r8 = Regex.Replace (T8, P8, "", "XS");  
   
Delete Spaces at the beginning and end of the string

String t9a = "leading";

String p9a = @ "^\s+";

String r9a = Regex.Replace (t9a, p9a, "");

String t9b = "trailing";

String p9b = @ "\s+$";

String r9b = Regex.Replace (t9b, p9b, "");

Add the character n after the character \ to make it a true new line

string T10 = @ "\ntest\n";

string R10 = Regex.Replace (T10, @ "\\n", "\ n");

Convert IP Address

String T11 = "55.54.53.52";

String p11 = "^" +

@ "([01]?\d\d|2[0-4]\d|25[0-5]) \." +

@ "([01]?\d\d|2[0-4]\d|25[0-5]) \." +

@ "([01]?\d\d|2[0-4]\d|25[0-5]) \." +

@ "([01]?\d\d|2[0-4]\d|25[0-5])" +

"$";

Match M11 = Regex.match (T11, p11);

Delete the path that the file name contains

String T12 = @ "C:\file.txt";

String p12 = @ "^.*\\";

String R12 = Regex.Replace (T12, P12, "");

Joins rows in multiple lines of string

string t13 = @ "This is

A split line ";

string p13 = @ "\s*\r?\n\s*";

String R13 = Regex.Replace (t13, P13, "");

Extracts all numbers in a string

string t14 = @ "

Test 1

Test 2.3

Test 47

";

String p14 = @ "(\d+\.? \d*|\.\d+) ";

MatchCollection mc14 = regex.matches (t14, p14);

Find all the Capitals

String t15 = "This was a Test of all Caps";

String p15 = @ "(\b[^\wa-z0-9_]+\b)";

MatchCollection mc15 = regex.matches (t15, P15);

Find the lowercase words

String t16 = "This is A Test of lowercase";

string p16 = @ "(\b[^\wa-z0-9_]+\b)";

MatchCollection MC16 = regex.matches (t16, p16);

Find the first word with a capital letter

String t17 = "This is A Test of Initial Caps";

String p17 = @ "(\b[^\wa-z0-9_][^\wa-z0-9_]*\b)";

MatchCollection MC17 = regex.matches (t17, p17);

Find links in the simple HTML language


String t18 = @ "


<a href= "" first.htm "" >first tag text</a>

<a href= "" next.htm "" >next tag text</a>


";

string p18 = @ "<a[^>]*? href\s*=\s*["" ' ""? "+ @" ([^ ' "" ">]+?) [' "" "]?>";

MatchCollection mc18 = regex.matches (t18, p18, "Si");

Reprinted from: http://www.aspnetjia.com

C # The application of the expression in replace!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.