Interpret C # regular expressions.

Source: Internet
Author: User

Over the years, many programming languages and tools have supported regular expressions ,. net base class library contains a namespace and a series of classes that can fully utilize the power of Rule expressions, and they are also compatible with future rule expressions in Perl 5.


In addition, the Regexp class can complete some other functions, such as the combination mode from right to left and expression editing.

In this article, I will briefly introduce system. text. classes and methods in regularexpression, examples of string matching and replacement, and detailed information about the group structure. Finally, we will introduce some common expressions that you may use.


Basic knowledge to be mastered

The knowledge of Rule expressions may be one of the things that many programmers often forget. In this article, we will assume that you have mastered the usage of regular expressions, especially the expressions in Perl 5 .. Net Regexp class is a superset of expressions in Perl 5, So theoretically it will be a good start point. We also assume that you have basic knowledge of C # syntax and. NET architecture.


If you do not have knowledge about Rule expressions, I suggest you start with the Perl 5 syntax. The authoritative book on Rule expressions is written by Jeffrey fredel. We strongly recommend that you read this book for readers who wish to have a deep understanding of expressions.


Regularexpression combination

The Regexp rule class is included in the system. Text. regularexpressions. dll file. You must reference this file when compiling the application software. For example:

Csc r: system. Text. regularexpressions. dll Foo. CS

The command will create the foo.exe file, which references the system. Text. regularexpressions file.

Namespace Introduction

The namespace contains only six classes and one definition. They are:

Capture: contains a matching result;
Capturecollection: the sequence of capture;
Group: the result of a group record, inherited by capture;
Match: the matching result of an expression, inherited by the Group;
Matchcollection: a sequence of match;
Matchevaluator: the proxy used to perform the replacement operation;
RegEx: An Example of the compiled expression.

The RegEx class also contains some static methods:

Escape: escape the escape characters in the RegEx string;
Ismatch: If the expression matches a string, this method returns a Boolean value;
Match: returns the instance of the match;
Matches: returns a series of match methods;
Replace: Replace the matching expression with the replacement string;
Split: returns a series of strings determined by expressions;
Unescape: do not escape characters in strings.

Simple Matching

First, we start to learn from simple expressions of the RegEx and match classes.

Match m = RegEx. Match ("abracadabra", "(a | B | r) + ");

We now have an instance of the match class that can be used for testing, for example: If (M. Success )...
If you want to use a matched string, you can convert it into a string:

Console. writeline ("match =" + M. tostring ());

In this example, the following output is obtained: match = Abra. This is the matched string.

String replacement

Simple string replacement is very intuitive. For example, the following statement:

String S = RegEx. Replace ("abracadabra", "Abra", "ZZZZ ");

It returns the string zzzzzzcadzzzz, and all matched strings are replaced with zzzzzzz.

Now let's look at a complicated string replacement example:

String S = RegEx. Replace ("Abra", @ "^/S *(.*?) /S * $ "," $1 ");

This statement returns the string Abra, with leading and trailing spaces removed.

The preceding mode is useful for deleting leading and trailing spaces in any string. In C #, we often use letter strings. In a letter string, the compiler does not treat the character "/" as an escape character. When the character "/" is used to specify the Escape Character, @ "..." is very useful. It is also worth mentioning that $1 is used in string replacement, which indicates that the replacement string can only contain the replaced string.


Matching engine details

Now, we use a group structure to understand a slightly complex example. See the following example:

String text = "abracadabra1abracadabra2abracadabra3 ";

String PAT = @"

(# Start of the first group

Abra # match the string Abra

(# Start of the second group

CAD # matching string CAD

)? # End of the second group (optional)

) # End of the first group

+ # Match once or multiple times

";

// Ignore comments using the x modifier

RegEx r = new RegEx (Pat, "x ");

// Obtain the group number list

Int [] gnums = R. getgroupnumbers ();

// Match for the first time

Match m = R. Match (text );

While (M. Success)

{

// Start with Group 1

For (INT I = 1; I <gnums. length; I ++)

{

Group G = M. Group (gnums [I]);

// Obtain the matched group

Console. writeline ("group" + gnums [I] + "= [" + G. tostring () + "]");

// Calculate the start position and length of the Group

Capturecollection cc = G. captures;

For (Int J = 0; j <cc. Count; j ++)

{

Capture c = Cc [J];

Console. writeline ("capture" + J + "= [" + C. tostring ()

+ "] Index =" + C. index + "length =" + C. Length );

}

}

// Next match

M = M. nextmatch ();

}

The output of this example is as follows:
     
Group1 = [Abra]

Capture0 = [abracad] Index = 0 length = 7

Capture1 = [Abra] Index = 7 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 4 length = 3

Group1 = [Abra]

Capture0 = [abracad] Index = 12 length = 7

Capture1 = [Abra] Index = 19 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 16 length = 3

Group1 = [Abra]

Capture0 = [abracad] Index = 24 length = 7

Capture1 = [Abra] Index = 31 length = 4

Group2 = [CAD]

Capture0 = [CAD] Index = 28 length = 3

First, we start by examining the string Pat, which contains an expression. The first capture starts with the first parentheses, and then the expression matches with an Abra. The second capture group starts from the second parentheses, but the first capture group is not over yet. This means that the first group matches abracad, the matching result of the second group is only CAD. Therefore, if you use? To make CAD an optional match, the matching result may be Abra or abracad. Then, the first group ends and the expression is required to be matched multiple times by specifying the + symbol.


Now let's take a look at what happens in the matching process. First, call the constructor method of RegEx to create an instance of the expression and specify various options. In this example, because there is a comment in the expression, the X option is selected, and some spaces are used. When the X option is enabled, the expression ignores comments and spaces without escape.


Then, retrieve the list of group numbers defined in the expression. Of course you can use these numbers explicitly. Here you use the programming method. If a named group is used, this method is also very effective as a way to create a fast index.


The next step is to complete the first matching. Use a loop to test whether the current matching is successful. Next, repeat this operation on the group list from Group 1. In this example, group 0 is not used because group 0 is a fully matched string. To collect all matched strings as a single string, group 0 is used.


We track the capturecollection in each group. Normally, each group can have only one capture, but group1 in this example has two capture: capture0 and capture1. If you only need the tostring of group1, you will get only abra. Of course, it will also match abracad. The value of tostring in the group is the value of the last capture in its capturecollection, which is exactly what we need. If you want the entire process to end after matching Abra, you should delete the + symbol from the expression to let the RegEx engine know that we only need to match the expression.


Comparison between process-based and expression-based methods

Generally, users who use rule expressions can be divided into the following two categories: the first type of users should try not to use rule expressions, but use procedures to perform operations that need to be repeated; the second type of users make full use of the functions and power of the Rule Expression Processing Engine, and use the process as little as possible.


For most of our users, the best solution is to use both of them. I hope this article will illustrate the role of the Regexp class in the. NET language and its advantages and disadvantages between performance and complexity.

Process-based model

We often need to use a function in programming to match a part of a string or process other strings. Below is an example of matching words in a string:

String text = "the quick red fox jumped over the lazy brown dog .";

System. Console. writeline ("text = [" + TEXT + "]");

String result = "";

String Pattern = @ "/W + |/W + ";

Foreach (Match m in RegEx. Matches (text, pattern ))

{

// Obtain the matched string

String x = M. tostring ();

// If the first character is lowercase

If (char. islower (X [0])

// Convert to uppercase

X = Char. toupper (X [0]) + X. substring (1, x. Length-1 );

// Collect all characters

Result + = X;

}

System. Console. writeline ("result = [" + Result + "]");

As shown in the preceding example, we use the foreach statement in C # to process each matching character and complete corresponding processing. In this example, a new result string is created. The output of this example is as follows:


TEXT = [the quick red fox jumped over the lazy brown dog.]

Result = [the quick red fox jumped over the lazy brown dog.]

Expression-based mode

Another way to complete the functions in the above example is through a matchevaluator. The new Code is as follows:

Static string captext (Match m)

{

// Obtain the matched string

String x = M. tostring ();

// If the first character is lowercase

If (char. islower (X [0])

// Convert to uppercase

Return Char. toupper (X [0]) + X. substring (1, x. Length-1 );

Return X;

}



Static void main ()

{

String text = "the quick red fox jumped over

Lazy brown dog .";

System. Console. writeline ("text = [" + TEXT + "]");

String Pattern = @ "/W + ";

String result = RegEx. Replace (text, pattern,

New matchevaluator (test. captext ));

System. Console. writeline ("result = [" + Result + "]");

}

At the same time, it is important to note that this mode is very simple because you only need to modify words without modifying non-words.

Common expressions

To better understand how to use rule expressions in the C # environment, I have written some rule expressions that may be useful to you. These expressions have been used in other environments, hope to help you.

Roman numerals

String p1 = "^ m * (D? C {0, 3} | C [DM]) "+" (L? X {0, 3} | x [LC]) (V? I {0, 3} | I [VX]) $ ";

String T1 = "VII ";

Match M1 = RegEx. Match (T1, P1 );

Exchange the first two words

String t2 = "The quick brown fox ";

String P2 = @ "(/S + )";

RegEx X2 = new RegEx (P2 );

String r2 = x2.replace (t2, "$3 $2 $1", 1 );

Guan jianzi = Value

String T3 = "myval = 3 ";

String P3 = @ "(/W +)/S * =/S * (. *)/S * $ ";

Match m3 = RegEx. Match (T3, P3 );

Implement 80 characters per line

String t4 = "********************"

+ "******************************"

+ "******************************";

String P4 = ". {80 ,}";

Match M4 = RegEx. Match (T4, P4 );

Month/day/year hour: minute: Second Time Format

String T5 = "01/01/01 16:10:01 ";

String P5 = @ "(/d +)/(/d +) :(/d + )";

Match M5 = RegEx. Match (T5, P5 );

Change directory (applicable only to Windows)

String T6 = @ "C:/Documents and Settings/user1/desktop /";

String R6 = RegEx. Replace (T6, @ "// user1 //", @ "// user2 //");


Extended 16-bit escape characters

String T7 = "% 41"; // capital

String P7 = "% ([0-9a-fa-f] [0-9a-fa-f])";

String R7 = RegEx. Replace (T7, P7, hexconvert );

Delete comments in C Language (to be improved)

String T8 = @"

/*

* Notes for traditional styles

*/

";

String P8 = @"

// * # Match the delimiters starting with the Annotation

.*? # Matching comments

/*/# Match the end Separator of the comment

";

String R8 = RegEx. Replace (T8, p8, "", "XS ");

Deletes spaces at the start and end of a string.

String t9a = "leading ";

String p9a = @ "^/S + ";

String r9a = RegEx. Replace (t9a, p9a ,"");

String t9b = "trailing ";

String p9b = @ "/S + $ ";

String r9b = RegEx. Replace (t9b, p9b ,"");

Add character N after character/to make it a real New Line

String t10 = @ "/ntest/N ";

String R10 = RegEx. Replace (T10, @ "// n", "/N ");

Convert IP addresses

String T11 = "55.54.53.52 ";

String P11 = "^" +

@ "([01]? /D/d | 2 [0-4]/d | 25 [0-5])/. "+

@ "([01]? /D/d | 2 [0-4]/d | 25 [0-5])/. "+

@ "([01]? /D/d | 2 [0-4]/d | 25 [0-5])/. "+

@ "([01]? /D/d | 2 [0-4]/d | 25 [0-5]) "+

"$ ";

Match M11 = RegEx. Match (T11, P11 );

Delete the path contained in the file name

String T12 = @ "C:/file.txt ";

String p12 = @ "^ .*//";

String R12 = RegEx. Replace (T12, p12 ,"");

Concatenate rows from multiple strings

String t13 = @ "this is

A split line ";

String P13 = @ "/S */R? /N/S *";

String R13 = RegEx. Replace (t13, P13 ,"");

Extract all numbers in the string

String T14 = @"

Test 1

Test 1, 2.3

Test 47

";

String p14 = @ "(/d + /.? /D * |/./d + )";

Matchcollection mc14 = RegEx. Matches (T14, p14 );

Find all uppercase letters

String T15 = "this is a test of all caps ";

String P15 = @ "(/B [^/Wa-z0-9 _] +/B )";

Matchcollection mc15 = RegEx. Matches (T15, p15 );

Find lowercase words

String T16 = "this is a test of lowercase ";

String p16 = @ "(/B [^/WA-Z0-9 _] +/B )";

Matchcollection mc16 = RegEx. Matches (T16, p16 );

Find the words whose first letter is uppercase.

String t17 = "this is a test of initial caps ";

String P17 = @ "(/B [^/Wa-z0-9 _] [^/WA-Z0-9 _] */B )";

Matchcollection mc17 = RegEx. Matches (t17, P17 );

Find links in a simple HTML language

String T18 = @"

<HTML>

<A href = "" first.htm ""> first tag text </a>

<A href = "" next.htm ""> next tag text </a>

</Html>

";

String P18 = @ "<A [^>] *? Href/S * =/S * ["']? "+ @" ([^ '">] + ?) ['""]?> ";

Matchcollection mc18 = RegEx. Matches (T18, P18, "Si ");

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.