C # Regular Expression Learning

Source: Internet
Author: User

The power of regular expressions cannot be underestimated. Just a few characters often outperform dozens of lines of code, greatly simplifying our redundant code.

In the past, many regular expressions were used in JS. Today, I am familiar with the use of regular expressions in C #. I am entitled to take notes!

If you use regular expressions as a language, the learning of regular expressions is also the same as that of other languages, from historical origins to basic syntaxes, from advanced features to performance optimization.

History:

The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks. In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used. Later, it was found that this work could be applied to some early research using Ken Thompson's computational search algorithm, which is the main inventor of UNIX. The first utility of regular expressions is the QED editor in UNIX. As they said, the rest is the well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.

Basic syntax characters:

\ D (representing 0-9 digits)

\ D (other characters except numbers)

\ W (representing all word characters-numbers, letters, and underscores)

\ W (all characters except word characters)

\ S (white space characters)

\ S (represents characters other than white spaces)

. (Any character except line breaks)

[,] (Matching all characters listed in square brackets)

[^,] (Matches all characters except the characters listed in square brackets)

\ B (matching word boundary)

\ B (matching non-word boundary)

^ (Matching the starting position of a character)

$ (Matching the end of a character)

{N} (match n matching characters)

{N, m} (matching n to M qualified characters)

{N ,}( match more than or equal to N matching characters)

? (Match 1 or 0 matching characters)

+ (Match one or more matching characters)

* (Matching 0 or multiple matching characters)

(A | B) (match characters that meet the or B conditions)

 

Below are some basic examples to familiarize yourself with the above basic syntax.

1. Match 3 numbers, such as 134

\ D {3}

2. match a word with one or more numbers starting with a letter and ending with a letter, such as a123b

^ [A-Za-Z] \ D + [A-Za-Z] $

3. match a landline phone such as 021-81234563 or 0512-81755456

^ \ D {3, 4}-\ D {8}

4. match a positive integer

[1-9] [0-9] *

5. match two decimal places

([0-9] [1-9] *) | ([1-9] [0-9] *) + \. \ D {2}

6. Match the zip code

^ \ D {6} $

7. Match the mobile phone number

^ [1] [3-9] \ D {9} $

8. Matching ID card numbers

^ \ D {18} $) | ^ \ D {15} $

9. Match Chinese Characters

^ [\ U4e00-\ u9fa5] {1,} $

10. Match URL

^ HTTP (s )? ([\ W-] + \.) + (\ W-) + (/[\ W -./? % & =] *)? $

The above is the basic syntax. Let's take a look at how C # uses them.

System. Text. regularexpressions. RegEx

He provides the following method to use regular expressions:

1. Whether ismatch matches-sample code:

1 // verify the mobile phone number 2 Public bool ismobile (string mobile) {3 return system. text. regularexpressions. regEx. ismatch (mobile, @ "^ [1] [3-9] \ D {9} $"); 4}

2. Split cut strings based on conditions

Sample Code

// Split the string Public String [] splitstr (string Str) {return system. text. regularexpressions. regEx. split (STR, @ "[0-9]");} protected void btn_split_click (Object sender, eventargs e) {string [] result = splitstr (this. tb_pwd.text); int Len = result. length; For (INT I = 0; I <Len; I ++) {If (result [I]! = "") {Response. Write ("<SCRIPT> alert ('split! "+ Result [I] +" ') </SCRIPT> ");}}}

 

3. Replace

Replace string

1 // replace all numbers in the string with the specified character 2 Public String replaceword (string str1, string str2) {3 return system. text. regularexpressions. regEx. replace (str1, @ "\ D", str2); 4}

 

4. Matches

Get matching set

1 // verify duplicate words (Regular Expressions need to be optimized) 2 Public String [] repeatwords (string Str) {3 system. text. regularexpressions. matchcollection matches = 4 system. text. regularexpressions. regEx. matches (STR, @ "\ B (? <Word> \ W +) \ s + (\ K <word>) \ B ", system. text. regularexpressions. regexoptions. compiled | system. text. regularexpressions. regexoptions. ignorecase); 5 Int aindex = matches. count; 6 if (aindex! = 0) {7 string [] repeatword = new string [aindex]; 8 int I = 0; 9 foreach (system. text. regularexpressions. match match in matches) {10 string word = match. groups ["word"]. value; 11 repeatword [I] = word; 12 I ++; 13} 14 return repeatword; 15} 16 else {17 return NULL; 18} 19}

 

Advanced features of Regular Expressions

1. group and non-capturing Group

The Group stores the characters that meet the group conditions in the ARC brackets and uses the index method for the following matching calls.

For example, you need to match abc123abc

In this way, we can ^ (ABC) 123 \ 1 $. Here () is a group to be captured, and its condition is ABC. At this time, in the next position, we only need to use \ 1 to repeat the value captured last time to match. If there are two groups, we will use \ 2 to obtain the second group.

How can we use it in C?

String x = "abc123abc"; RegEx r = new RegEx (@ "^ (ABC) 123 \ 1 $"); If (R. ismatch (x) {console. writeline ("group1 value:" + R. match (X ). groups [1]. value); // output: ABC}

Why is it groups [1] here, because the first matched character string that meets all conditions is matched, and then the qualified group is stored.

We can also name the group:

String x = "abc123abc"; RegEx r = new RegEx (@ "^ (? <Test> abc) 123 \ 1 $ "); If (R. ismatch (x) {console. writeline ("group1 value:" + R. match (X ). groups ["test"]. value); // output: ABC}

Is this more vivid?

Sometimes we want to match a group but do not want to save the content that matches the group. In this case, we can use? :

1 string x = "abc123abc"; 2 RegEx r = new RegEx (@ "^ (? : ABC) 123 \ 1 $ "); 3 if (R. ismatch (x) 4 {5 console. writeline ("group1 value:" + R. match (X ). groups [1]. value); // output: null6}

2. Greedy mode and non-Greedy Mode

In general, the regular expressions are greedy, especially in the + or * modifier conditions. The regular expressions always match more content as much as possible? No. In this case, it will immediately become a non-Greedy mode.

1 string x = "Live for nothing, die for something"; 2 RegEx R1 = new RegEx (@". * thing "); 3 if (r1.ismatch (x) 4 {5 console. writeline ("Match:" + r1.match (X ). value); // output: Live for nothing, die for something 6} 7 RegEx r2 = new RegEx (@". *? Thing "); 8 If (r2.ismatch (x) 9 {10 console. writeline (" Match: "+ r2.match (x). Value); // output: Live for nothing11}

3. backtracking and non-backtracking

In the greedy mode of Regular Expression matching by default, when a matched character falls into a dead end, it will be traced back until the next character can be matched.

For example (. *) ABC to match 123abc123abc first. * greedy match will be performed until the position at the end of the character is matched, and then a will be matched. If no matching character is found, the engine will backtrack back until a matches a in the last ABC, then match B, and then match C, so the result is 123abc123abc.

Okay. Next we will explain the execution process in non-backtracking mode. * matches the character ending position like a hungry wolf. When a matches a, it finds that a cannot match. In this mode, no backtracking is performed, so the matching fails, in some services, we need such non-backtracking matching. Syntax example: (?>. *) ABC

 

4. Forward pre-search reverse pre-Search

Not easy to explain, for example

Forward pre-Search

String x = "1024 used 2048 free"; RegEx R1 = new RegEx (@ "\ D {4 }(? = Used) "); If (r1.matches (X ). count = 1) {console. writeline ("R1 match:" + r1.match (X ). value); // outputs: 1024} RegEx r2 = new RegEx (@ "\ D {4 }(?! Used) "); If (r2.matches (X ). count = 1) {console. writeline ("R2 match:" + r2.match (X ). value); // output: 2048}

R1 indicates that the match is followed by the four digits of used, so it matches 1024 R2. The match is not followed by the four digits of used.

 

Reverse pre-Search

String x = "used: 1024 free: 2048"; RegEx R1 = new RegEx (@"(? <= Used :) \ D {4} "); If (r1.matches (X ). count = 1) {console. writeline ("R1 match:" + r1.match (X ). value); // The output is 1024} RegEx r2 = new RegEx (@"(? <! Used :) \ D {4} "); If (r2.matches (X ). count = 1) {console. writeline ("R2 match:" + r2.match (X ). value); // output: 2048}

R1 matches the preceding four digits with used: So it matches 1024 R2. the preceding four digits are not followed by used :.

 

Looking at the example, we can understand it very well. In addition, the forward and reverse groups are not saved.

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.