The power of regular expressions cannot be underestimated. Just a few characters often outperform dozens of lines of code, greatly simplifying our redundant code.
In the past, many regular expressions were used in JS. Today, I am familiar with the use of regular expressions in C #. I am entitled to take notes!
If you use regular expressions as a language, the learning of regular expressions is also the same as that of other languages, from historical origins to basic syntaxes, from advanced features to performance optimization.
History:
The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks. In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used. Later, it was found that this work could be applied to some early research using Ken Thompson's computational search algorithm, which is the main inventor of UNIX. The first utility of regular expressions is the QED editor in UNIX. As they said, the rest is the well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.
Basic syntax characters:
\ D (representing 0-9 digits)
\ D (other characters except numbers)
\ W (representing all word characters-numbers, letters, and underscores)
\ W (all characters except word characters)
\ S (white space characters)
\ S (represents characters other than white spaces)
. (Any character except line breaks)
[,] (Matching all characters listed in square brackets)
[^,] (Matches all characters except the characters listed in square brackets)
\ B (matching word boundary)
\ B (matching non-word boundary)
^ (Matching the starting position of a character)
$ (Matching the end of a character)
{N} (match n matching characters)
{N, m} (matching n to M qualified characters)
{N ,}( match more than or equal to N matching characters)
? (Match 1 or 0 matching characters)
+ (Match one or more matching characters)
* (Matching 0 or multiple matching characters)
(A | B) (match characters that meet the or B conditions)
Below are some basic examples to familiarize yourself with the above basic syntax.
1. Match 3 numbers, such as 134
\ D {3}
2. match a word with one or more numbers starting with a letter and ending with a letter, such as a123b
^ [A-Za-Z] \ D + [A-Za-Z] $
3. match a landline phone such as 021-81234563 or 0512-81755456
^ \ D {3, 4}-\ D {8}
4. match a positive integer
[1-9] [0-9] *
5. match two decimal places
([0-9] [1-9] *) | ([1-9] [0-9] *) + \. \ D {2}
6. Match the zip code
^ \ D {6} $
7. Match the mobile phone number
^ [1] [3-9] \ D {9} $
8. Matching ID card numbers
^ \ D {18} $) | ^ \ D {15} $
9. Match Chinese Characters
^ [\ U4e00-\ u9fa5] {1,} $
10. Match URL
^ HTTP (s )? ([\ W-] + \.) + (\ W-) + (/[\ W -./? % & =] *)? $
The above is the basic syntax. Let's take a look at how C # uses them.
System. Text. regularexpressions. RegEx
He provides the following method to use regular expressions:
1. Whether ismatch matches-sample code:
1 // verify the mobile phone number 2 Public bool ismobile (string mobile) {3 return system. text. regularexpressions. regEx. ismatch (mobile, @ "^ [1] [3-9] \ D {9} $"); 4}
2. Split cut strings based on conditions
Sample Code
// Split the string Public String [] splitstr (string Str) {return system. text. regularexpressions. regEx. split (STR, @ "[0-9]");} protected void btn_split_click (Object sender, eventargs e) {string [] result = splitstr (this. tb_pwd.text); int Len = result. length; For (INT I = 0; I <Len; I ++) {If (result [I]! = "") {Response. Write ("<SCRIPT> alert ('split! "+ Result [I] +" ') </SCRIPT> ");}}}
3. Replace
Replace string
1 // replace all numbers in the string with the specified character 2 Public String replaceword (string str1, string str2) {3 return system. text. regularexpressions. regEx. replace (str1, @ "\ D", str2); 4}
4. Matches
Get matching set
1 // verify duplicate words (Regular Expressions need to be optimized) 2 Public String [] repeatwords (string Str) {3 system. text. regularexpressions. matchcollection matches = 4 system. text. regularexpressions. regEx. matches (STR, @ "\ B (? <Word> \ W +) \ s + (\ K <word>) \ B ", system. text. regularexpressions. regexoptions. compiled | system. text. regularexpressions. regexoptions. ignorecase); 5 Int aindex = matches. count; 6 if (aindex! = 0) {7 string [] repeatword = new string [aindex]; 8 int I = 0; 9 foreach (system. text. regularexpressions. match match in matches) {10 string word = match. groups ["word"]. value; 11 repeatword [I] = word; 12 I ++; 13} 14 return repeatword; 15} 16 else {17 return NULL; 18} 19}
Advanced features of Regular Expressions
1. group and non-capturing Group
The Group stores the characters that meet the group conditions in the ARC brackets and uses the index method for the following matching calls.
For example, you need to match abc123abc
In this way, we can ^ (ABC) 123 \ 1 $. Here () is a group to be captured, and its condition is ABC. At this time, in the next position, we only need to use \ 1 to repeat the value captured last time to match. If there are two groups, we will use \ 2 to obtain the second group.
How can we use it in C?
String x = "abc123abc"; RegEx r = new RegEx (@ "^ (ABC) 123 \ 1 $"); If (R. ismatch (x) {console. writeline ("group1 value:" + R. match (X ). groups [1]. value); // output: ABC}
Why is it groups [1] here, because the first matched character string that meets all conditions is matched, and then the qualified group is stored.
We can also name the group:
String x = "abc123abc"; RegEx r = new RegEx (@ "^ (? <Test> abc) 123 \ 1 $ "); If (R. ismatch (x) {console. writeline ("group1 value:" + R. match (X ). groups ["test"]. value); // output: ABC}
Is this more vivid?
Sometimes we want to match a group but do not want to save the content that matches the group. In this case, we can use? :
1 string x = "abc123abc"; 2 RegEx r = new RegEx (@ "^ (? : ABC) 123 \ 1 $ "); 3 if (R. ismatch (x) 4 {5 console. writeline ("group1 value:" + R. match (X ). groups [1]. value); // output: null6}
2. Greedy mode and non-Greedy Mode
In general, the regular expressions are greedy, especially in the + or * modifier conditions. The regular expressions always match more content as much as possible? No. In this case, it will immediately become a non-Greedy mode.
1 string x = "Live for nothing, die for something"; 2 RegEx R1 = new RegEx (@". * thing "); 3 if (r1.ismatch (x) 4 {5 console. writeline ("Match:" + r1.match (X ). value); // output: Live for nothing, die for something 6} 7 RegEx r2 = new RegEx (@". *? Thing "); 8 If (r2.ismatch (x) 9 {10 console. writeline (" Match: "+ r2.match (x). Value); // output: Live for nothing11}
3. backtracking and non-backtracking
In the greedy mode of Regular Expression matching by default, when a matched character falls into a dead end, it will be traced back until the next character can be matched.
For example (. *) ABC to match 123abc123abc first. * greedy match will be performed until the position at the end of the character is matched, and then a will be matched. If no matching character is found, the engine will backtrack back until a matches a in the last ABC, then match B, and then match C, so the result is 123abc123abc.
Okay. Next we will explain the execution process in non-backtracking mode. * matches the character ending position like a hungry wolf. When a matches a, it finds that a cannot match. In this mode, no backtracking is performed, so the matching fails, in some services, we need such non-backtracking matching. Syntax example: (?>. *) ABC
4. Forward pre-search reverse pre-Search
Not easy to explain, for example
Forward pre-Search
String x = "1024 used 2048 free"; RegEx R1 = new RegEx (@ "\ D {4 }(? = Used) "); If (r1.matches (X ). count = 1) {console. writeline ("R1 match:" + r1.match (X ). value); // outputs: 1024} RegEx r2 = new RegEx (@ "\ D {4 }(?! Used) "); If (r2.matches (X ). count = 1) {console. writeline ("R2 match:" + r2.match (X ). value); // output: 2048}
R1 indicates that the match is followed by the four digits of used, so it matches 1024 R2. The match is not followed by the four digits of used.
Reverse pre-Search
String x = "used: 1024 free: 2048"; RegEx R1 = new RegEx (@"(? <= Used :) \ D {4} "); If (r1.matches (X ). count = 1) {console. writeline ("R1 match:" + r1.match (X ). value); // The output is 1024} RegEx r2 = new RegEx (@"(? <! Used :) \ D {4} "); If (r2.matches (X ). count = 1) {console. writeline ("R2 match:" + r2.match (X ). value); // output: 2048}
R1 matches the preceding four digits with used: So it matches 1024 R2. the preceding four digits are not followed by used :.
Looking at the example, we can understand it very well. In addition, the forward and reverse groups are not saved.