Common Regular Expression operations RegEx replacement character

Source: Internet
Author: User
The knowledge of Rule expressions may be one of the things that many programmers often forget. In this article Article , We will
Assume that you have mastered the usage of Rule expressions, especially the expressions in Perl 5 .. Net Regexp class is per
L 5 is a superset of expressions. Therefore, theoretically speaking, it will serve as a good starting point. We also assume that you have C
# Basic syntax and. NET architecture knowledge.

If you do not have knowledge about Rule expressions, I suggest you start with the Perl 5 syntax. In rule expression
An authoritative book on style is written by Jeffrey fredel to understand the expression.
Readers, we strongly recommend that you read this book.

Regularexpression combination

The Regexp rule class is included in the system. Text. regularexpressions. dll file. You must reference this file when compiling the application software. For example:

Csc r: system. Text. regularexpressions. dll Foo. CS

The command will create the foo.exe file, which references the system. Text. regularexpressions file.

Namespace Introduction

The namespace contains only six classes and one definition. They are:

Capture: contains a matching result;
Capturecollection: the sequence of capture;
Group: the result of a group record, inherited by capture;
Match: the matching result of an expression, inherited by the Group;
Matchcollection: a sequence of match;
Matchevaluator: the proxy used to perform the replacement operation;
RegEx: An Example of the compiled expression.

The RegEx class also contains some static methods:

Escape: escape the escape characters in the RegEx string;
Ismatch: If the expression matches a string, this method returns a Boolean value;
Match: returns the instance of the match;
Matches: returns a series of match methods;
Replace: Replace the matching expression with the replacement string;
Split: returns a series of strings determined by expressions;
Unescape: do not escape characters in strings.

Simple Matching

First, we start to learn from simple expressions of the RegEx and match classes.

Match m = RegEx. Match ("abracadabra", "(a | B | r) + ");

We now have an instance of the match class that can be used for testing, for example: If (M. Success )...
If you want to use a matched string, you can convert it into a string:

Console. writeline ("match =" + M. tostring ());

In this example, the following output is obtained: match = Abra. This is the matched string.

String replacement

Simple string replacement is very intuitive. For example, the following statement:

String S = RegEx. Replace ("abracadabra", "Abra", "ZZZZ ");

It returns the string zzzzzzcadzzzz, and all matched strings are replaced with zzzzzzz.

Now let's look at a complicated string replacement example:

String S = RegEx. Replace ("Abra", @ "^ \ s *(.*?) \ S * $ "," $1 ");

This statement returns the string Abra, with leading and trailing spaces removed.

The preceding mode is useful for deleting leading and trailing spaces in any string. In C #, we often
Use a letter string, In a letter string, compile Program The character "\" is not treated as an escape character. In use
When the character "\" specifies the Escape Character, @ "..." is very useful. It is also worth mentioning that $1 is about string replacement.
It indicates that the replacement string can only contain the string to be replaced.

Matching engine details

Now, we use a group structure to understand a slightly complex example. See the following example:

String text = "abracadabra1abracadabra2abracadabra3 ";

String PAT = @"

(# Start of the first group

Abra # match the string Abra

(# Start of the second group

CAD # matching string CAD

)? # End of the second group (optional)

) # End of the first group

+ # Match once or multiple times

";

// Ignore comments using the x modifier

RegEx r = new RegEx (Pat, "x ");

// Obtain the group number list

Int [] gnums = R. getgroupnumbers ();

// Match for the first time

Match m = R. Match (text );

While (M. Success)

{

// Start with Group 1

For (INT I = 1; I <gnums. length; I ++)

{

Group G = M. Group (gnums [I]);

// Obtain the matched group

Console. writeline ("group" + gnums [I] + "= [" + G. tostring () + "]");

// Calculate the start position and length of the Group

Capturecollection cc = G. captures;

For (Int J = 0; j <cc. Count; j ++)

{

Capture c = Cc [J];

Console. writeline ("capture" + J + "= [" + C. tostring ()

+ "] Index =" + C. index + "length =" + C. Length );

}

}

// Next match

M = M. nextmatch ();

}

The output of this example is as follows:

Group1 = [Abra]

Capture0 = [abracad] Index = 0 length = 7

Capture1 = [Abra] Index = 7 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 4 length = 3

Group1 = [Abra]

Capture0 = [abracad] Index = 12 length = 7

Capture1 = [Abra] Index = 19 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 16 length = 3

Group1 = [Abra]

Capture0 = [abracad] Index = 24 length = 7

Capture1 = [Abra] Index = 31 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 28 length = 3

First, we start by examining the string Pat, which contains an expression. The first capture is from the first parentheses
Then the expression will match to an Abra. The second capture group starts from the second parentheses, but the first c
The apture group is not complete yet, which means that the first group matches abracad, while the second group matches only
Only CAD is used. Therefore, if you use? The matching result may be Abra.
Or abracad. Then, the first group ends and the expression is required to be matched multiple times by specifying the + symbol.

Now let's take a look at what happens in the matching process. First, call the constructor method of RegEx to create
An instance of the expression, and specify various options in it. In this example, because there is a comment in the expression, select
The X option is used, and some spaces are used. When the X option is enabled, the expression ignores comments and spaces without escape.

Then, retrieve the list of group numbers defined in the expression. You can use these numbers explicitly.
It uses a programming method. If a named group is used, this method is also very effective as a way to create a fast index.

The next step is to complete the first matching. Use a loop to test whether the current matching is successful. The next step is from Group 1.
Start to repeat this operation on the group list. In this example, group 0 is not used because group 0 is a complete
If you want to collect all matched strings as a single string, the Group
0.

We track the capturecollection in each group. Generally, each group can only
There is a capture, but group1 in this example has two capture: capture0 and capture1. If you only
If the tostring of group1 is required, only Abra is obtained. Of course, it also matches abracad. Tostring
The value is the value of the last capture in its capturecollection, which is exactly what we need. If you want
After matching Abra, we should delete the + symbol from the expression to let the RegEx engine know that we only need
Match.

Comparison between process-based and expression-based methods

Generally, users who use rule expressions can be divided into the following two categories: the first type users try not to use rule tables
But the process is used to execute some operations that need to be repeated. The second type of users make full use of the rule Expression Processing Engine.
And use the process as little as possible.

For most of our users, the best solution is to use both of them. I hope this article will illustrate the role of the Regexp class in the. NET language and its advantages and disadvantages between performance and complexity.

Process-based model

We often need to use a function in programming to match a part of a string or process other strings. Below is an example of matching words in a string:

String text = "the quick red fox jumped over the lazy brown dog .";

System. Console. writeline ("text = [" + TEXT + "]");

String result = "";

String Pattern = @ "\ W + | \ W + ";

Foreach (Match m in RegEx. Matches (text, pattern ))

{

// Obtain the matched string

String x = M. tostring ();

// If the first character is lowercase

If (char. islower (X [0])

// Convert to uppercase

X = Char. toupper (X [0]) + X. substring (1, x. Length-1 );

// Collect all characters

Result + = X;

}

System. Console. writeline ("result = [" + Result + "]");

As shown in the preceding example, we use the foreach statement in C # to process each matching character and complete corresponding processing. In this example, a new result string is created. The output of this example is as follows:

TEXT = [the quick red fox jumped over the lazy brown dog.]

Result = [the quick red fox jumped over the lazy brown dog.]

Expression-based mode

Another way to complete the functions in the above example is through a matchevaluator, the new Code As follows:

Static string captext (Match m)

{

// Obtain the matched string

String x = M. tostring ();

// If the first character is lowercase

If (char. islower (X [0])

// Convert to uppercase

Return Char. toupper (X [0]) + X. substring (1, x. Length-1 );

Return X;

}



Static void main ()

{

String text = "the quick red fox jumped over

Lazy brown dog .";

System. Console. writeline ("text = [" + TEXT + "]");

String Pattern = @ "\ W + ";

String result = RegEx. Replace (text, pattern,

New matchevaluator (test. captext ));

System. Console. writeline ("result = [" + Result + "]");

}

At the same time, it is important to note that this mode is very simple because you only need to modify words without modifying non-words.

Common expressions

To better understand how to use rule expressions in the C # environment, I have written some rule expressions that may be useful to you. These expressions have been used in other environments, hope to help you.

Roman numerals

String p1 = "^ m * (D? C {0, 3} | C [DM]) "+" (L? X {0, 3} | x [LC]) (V? I {0, 3} | I [VX]) $ ";

String T1 = "VII ";

Match M1 = RegEx. Match (T1, P1 );

Exchange the first two words

String t2 = "The quick brown fox ";

String P2 = @ "(\ s + )";

RegEx X2 = new RegEx (P2 );

String r2 = x2.replace (t2, "$3 $2 $1", 1 );

Guan jianzi = Value

String T3 = "myval = 3 ";

String P3 = @ "(\ W +) \ s * = \ s * (. *) \ s * $ ";

Match m3 = RegEx. Match (T3, P3 );

Implement 80 characters per line

String t4 = "********************"

+ "******************************"

+ "******************************";

String P4 = ". {80 ,}";

Match M4 = RegEx. Match (T4, P4 );

Month/day/year hour: minute: Second Time Format

String T5 = "01/01/01 16:10:01 ";

String P5 = @ "(\ D +)/(\ D +) :( \ D + )";

Match M5 = RegEx. Match (T5, P5 );

Change directory (applicable only to Windows)

String T6 = @ "C: \ Documents ents and Settings \ user1 \ Desktop \";

String R6 = RegEx. Replace (T6, @ "\ user1 \", @ "\ user2 \\");

Extended 16-bit escape characters

String T7 = "% 41"; // capital

String P7 = "% ([0-9a-fa-f] [0-9a-fa-f])";

String R7 = RegEx. Replace (T7, P7, hexconvert );

Delete comments in C Language (to be improved)

String T8 = @"

/*

* Notes for traditional styles

*/

";

String P8 = @"

/\ * # Match the delimiters starting with the Annotation

.*? # Matching comments

\ */# End Separator of matching Annotation

";

String R8 = RegEx. Replace (T8, p8, "", "XS ");

Deletes spaces at the start and end of a string.

String t9a = "leading ";

String p9a = @ "^ \ s + ";

String r9a = RegEx. Replace (t9a, p9a ,"");

String t9b = "trailing ";

String p9b = @ "\ s + $ ";

String r9b = RegEx. Replace (t9b, p9b ,"");

Add character N after character \ to make it a real New Line

String t10 = @ "\ ntest \ n ";

String R10 = RegEx. Replace (T10, @ "\ n", "\ n ");

Convert IP addresses

String T11 = "55.54.53.52 ";

String P11 = "^" +

@ "([01]? \ D \ d | 2 [0-4] \ d | 25 [0-5]) \. "+

@ "([01]? \ D \ d | 2 [0-4] \ d | 25 [0-5]) \. "+

@ "([01]? \ D \ d | 2 [0-4] \ d | 25 [0-5]) \. "+

@ "([01]? \ D \ d | 2 [0-4] \ d | 25 [0-5]) "+

"$ ";

Match M11 = RegEx. Match (T11, P11 );

Delete the path contained in the file name

String T12 = @ "C: \ file.txt ";

String p12 = @ "^ .*\\";

String R12 = RegEx. Replace (T12, p12 ,"");

Concatenate rows from multiple strings

String t13 = @ "this is

A split line ";

String P13 = @ "\ s * \ r? \ N \ s *";

String R13 = RegEx. Replace (t13, P13 ,"");

Extract all numbers in the string

String T14 = @"

Test 1

Test 1, 2.3

Test 47

";

String p14 = @ "(\ D + \.? \ D * | \. \ D + )";

Matchcollection mc14 = RegEx. Matches (T14, p14 );

Find all uppercase letters

String T15 = "this is a test of all caps ";

String P15 = @ "(\ B [^ \ Wa-z0-9 _] + \ B )";

Matchcollection mc15 = RegEx. Matches (T15, p15 );

Find lowercase words

String T16 = "this is a test of lowercase ";

String p16 = @ "(\ B [^ \ WA-Z0-9 _] + \ B )";

Matchcollection mc16 = RegEx. Matches (T16, p16 );

Find the words whose first letter is uppercase.

String t17 = "this is a test of initial caps ";

String P17 = @ "(\ B [^ \ Wa-z0-9 _] [^ \ WA-Z0-9 _] * \ B )";

Matchcollection mc17 = RegEx. Matches (t17, P17 );

Find links in a simple HTML language

String T18 = @"


<A href = "" first.htm?> First tag text </a>

<A style = "color: #000000" href = "" next.htm ???> Next tag text </a>

";

String P18 = @ "<A [^>] *? Href \ s * = \ s * ["" ']? "+ @" ([^ '">] + ?) ['""]?> ";

Matchcollection mc18 = RegEx. Matches (T18, P18, "Si ");

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.