C # Regular Expression preparation

Last Update:2018-12-06 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For a while, regular expression learning was very popular. At that time, I could see several regular expression posts in one day at csdn, during that time, I learned some basic knowledge through the C # string and regular expression reference manual published by the Forum and wrox press, and earned about 1000 points in csdn, today, when I went to the "C # string and regular expression Reference Manual", I was missing it. At present, regular expressions are used less often. Sort out the previous notes and do not forget them.

(1) "@" symbol
Symbol below two ows table research room hot, when the morning at "@" although not the C # Regular Expression of the "member", but it often with C # Regular Expression out of double inbound. "@" Indicates that the string following it is a "verbatim string", which is not very understandable. For example, the following two statements are equivalent:
String x = "d :\\ my Huang \ My doc ";
String y = @ "D: \ My Huang \ My doc ";
In fact, C # will report an error if it is declared as follows, because "\" is used in C # To implement escape, such as "\ n" line feed:
String x = "D: \ My Huang \ My doc ";

(2) Basic syntax characters.
\ D 0-9 digits
The complete set of \ D (take all the characters as the complete set, the same below), that is, all non-numeric characters
\ W characters, which are uppercase/lowercase letters, 0-9 digits, and underscores
Set of \ W
\ S blank characters, including line breaks \ n, carriage returns \ r, tabs \ t, vertical tabs \ v, line breaks \ f
Set of \ s
. Any character except linefeed \ n
[…] Match All characters listed in []
[^…] Match characters not listed in []
The following provides some simple examples:

Code
String I = "\ n ";
String M = "3 ";
RegEx r = new RegEx (@ "\ D ");
// Same as RegEx r = new RegEx ("\ D ");
// R. ismatch (I) Result: True
// R. ismatch (m) Result: false

String I = "% ";
String M = "3 ";
RegEx r = new RegEx ("[a-z0-9]");
// Match lowercase letters or numbers
// R. ismatch (I) Result: false
// R. ismatch (m) Result: True

(3) Positioning characters
"Positioning character" represents a virtual character, which represents a location, you can also intuitively think that "positioning character" represents the tiny gap between a character and character.
^ Indicates that the character after it must be at the beginning of the string
$ Indicates that the character before it must be at the end of the string
\ B matches the boundary of a word
\ B matches a non-word boundary
In addition, the character before \ A must be at the beginning of the character, and the character before \ Z must be at the end of the string, the character before \ Z must be at the end of the string or before the line break
The following provides some simple examples:

Code
String I = "Live for nothing, die for something ";
RegEx R1 = new RegEx ("^ live for nothing, die for something $ ");
// R1.ismatch (I) True
RegEx r2 = new RegEx ("^ live for nothing, die for some $ ");
// R2.ismatch (I) False
RegEx R3 = new RegEx ("^ live for nothing, die for some ");
// R3.ismatch (I) True

String I = @ "Live for nothing,
Die for something "; // multiple rows
RegEx R1 = new RegEx ("^ live for nothing, die for something $ ");
Console. writeline ("R1 match count:" + r1.matches (I). Count); // 0
RegEx r2 = new RegEx ("^ live for nothing, die for something $", regexoptions. multiline );
Console. writeline ("R2 match count:" + r2.matches (I). Count); // 0
RegEx R3 = new RegEx ("^ live for nothing, \ r \ ndie for something $ ");
Console. writeline ("R3 match count:" + r3.matches (I). Count); // 1
RegEx r4 = new RegEx ("^ live for nothing, $ ");
Console. writeline ("R4 match count:" + r4.matches (I). Count); // 0
RegEx R5 = new RegEx ("^ live for nothing, $", regexoptions. multiline );
Console. writeline ("R5 match count:" + r5.matches (I). Count); // 0
RegEx R6 = new RegEx ("^ live for nothing, \ r \ N $ ");
Console. writeline ("R6 match count:" + r6.matches (I). Count); // 0
RegEx R7 = new RegEx ("^ live for nothing, \ r \ N $", regexoptions. multiline );
Console. writeline ("R7 match count:" + r7.matches (I). Count); // 0
RegEx R8 = new RegEx ("^ live for nothing, \ r$ ");
Console. writeline ("R8 match count:" + r8.matches (I). Count); // 0
RegEx R9 = new RegEx ("^ live for nothing, \ r$", regexoptions. multiline );
Console. writeline ("R9 match count:" + r9.matches (I). Count); // 1
RegEx R10 = new RegEx ("^ die for something $ ");
Console. writeline ("R10 match count:" + r10.matches (I). Count); // 0
RegEx R11 = new RegEx ("^ die for something $", regexoptions. multiline );
Console. writeline ("R11 match count:" + r11.matches (I). Count); // 1
RegEx R12 = new RegEx ("^ ");
Console. writeline ("R12 match count:" + r12.matches (I). Count); // 1
RegEx R13 = new RegEx ("$ ");
Console. writeline ("R13 match count:" + r13.matches (I). Count); // 1
RegEx R14 = new RegEx ("^", regexoptions. multiline );
Console. writeline ("R14 match count:" + r14.matches (I). Count); // 2
RegEx R15 = new RegEx ("$", regexoptions. multiline );
Console. writeline ("R15 match count:" + r15.matches (I). Count); // 2
RegEx R16 = new RegEx ("^ live for nothing, \ r$ \ n ^ die for something $", regexoptions. multiline );
Console. writeline ("R16 match count:" + r16.matches (I). Count); // 1
// For a multi-line string, after the multiline option is set, ^ and $ match multiple times.

String I = "Live for nothing, die for something ";
String M = "Live for nothing, die for some thing ";
RegEx R1 = new RegEx (@ "\ bthing \ B ");
Console. writeline ("R1 match count:" + r1.matches (I). Count); // 0
RegEx r2 = new RegEx (@ "thing \ B ");
Console. writeline ("R2 match count:" + r2.matches (I). Count); // 2
RegEx R3 = new RegEx (@ "\ bthing \ B ");
Console. writeline ("R3 match count:" + r3.matches (M). Count); // 1
RegEx r4 = new RegEx (@ "\ BFOR something \ B ");
Console. writeline ("R4 match count:" + r4.matches (I). Count); // 1
// \ B is usually used to constrain a complete word

(4) repeated description characters
"Repeated description characters" is one of the places that reflect C # regular expressions "very powerful:
{N} matches the previous CHARACTER n times
{N,} matches the previous CHARACTER n times or more than N times
{N, m} matches the previous characters n to m
? Match the first character 0 or 1 time
+ Match the previous character once or more
* Match the first character 0 times or equal to 0 times
The following provides some simple examples:

Code
String * = "1024 ";
String y = "+ 1024 ";
String Z = "1,024 ";
String A = "1 ";
String B = "-1024 ";
String c = "10000 ";
RegEx r = new RegEx (@ "^ \ +? [1-9],? \ D {3} $ ");
Console. writeline ("x match count:" + R. Matches (x). Count); // 1
Console. writeline ("y match count:" + R. Matches (Y). Count); // 1
Console. writeline ("Z match count:" + R. Matches (Z). Count); // 1
Console. writeline ("A match count:" + R. Matches (a). Count); // 0
Console. writeline ("B match count:" + R. Matches (B). Count); // 0
Console. writeline ("C match count:" + R. Matches (c). Count); // 0
// Match the integer between 1000 and 9999.

(5) select one matching
The (|) symbol in the C # regular expression does not seem to have a special title, so it is called "select a match. In fact, like [A-Z] is also an alternative match, except that it can only match a single character, while (|) provides a larger range, (AB | xy) matches AB or XY. Note that "|" and "()" are a whole. The following provides some simple examples:

Code
String x = "0 ";
String y= "0.23 ";
String Z = "100 ";
String A = "100.01 ";
String B = "9.9 ";
String c = "99.9 ";
String d = "99 .";
String E = "00.1 ";
RegEx r = new RegEx (@ "^ \ +? (100 (. 0 +) *) | ([1-9]? [0-9]) (\. \ D +) *) $ ");
Console. writeline ("x match count:" + R. Matches (x). Count); // 1
Console. writeline ("y match count:" + R. Matches (Y). Count); // 1
Console. writeline ("Z match count:" + R. Matches (Z). Count); // 1
Console. writeline ("A match count:" + R. Matches (a). Count); // 0
Console. writeline ("B match count:" + R. Matches (B). Count); // 1
Console. writeline ("C match count:" + R. Matches (c). Count); // 1
Console. writeline ("D match count:" + R. Matches (d). Count); // 0
Console. writeline ("e match count:" + R. Matches (e). Count); // 0
// Match the number from 0 to 100. The outer brackets contain two parts: "(100 (. 0 +) *)" and "([1-9]? [0-9]) (\. \ D +) * ", the two parts are the" or "relationship, that is, the Regular Expression Engine will first try to match 100. If it fails, then try to match the last expression (representing a number in the range [0,100 ).

(6) Matching of special characters
The following provides some simple examples:

Code
String x = "\\";
RegEx R1 = new RegEx ("^ \\\\ $ ");
Console. writeline ("R1 match count:" + r1.matches (x). Count); // 1
RegEx r2 = new RegEx (@ "^ \ $ ");
Console. writeline ("R2 match count:" + r2.matches (x). Count); // 1
RegEx R3 = new RegEx ("^ \ $ ");
Console. writeline ("R3 match count:" + r3.matches (x). Count); // 0
// Match "\"

String x = "\"";
RegEx R1 = new RegEx ("^ \" $ ");
Console. writeline ("R1 match count:" + r1.matches (x). Count); // 1
RegEx r2 = new RegEx (@ "^" "$ ");
Console. writeline ("R2 match count:" + r2.matches (x). Count); // 1
// Match double quotes

(7) group and non-capture group
The following provides some simple examples:

Code
String x = "Live for nothing, die for something ";
String y = "Live for nothing, die for somebody ";
RegEx r = new RegEx (@ "^ live ([A-Z] {3}) No ([A-Z] {5 }), die \ 1 Some \ 2 $ ");
Console. writeline ("x match count:" + R. Matches (x). Count); // 1
Console. writeline ("y match count:" + R. Matches (Y). Count); // 0
// The Regular Expression Engine remembers the matched content in "()" as a "group" and can be referenced through indexes. "\ 1" in the expression is used to reverse reference the first group that appears in the expression, that is, the content of the first bracket marked in bold, and "\ 2", so forth.

String x = "Live for nothing, die for something ";
RegEx r = new RegEx (@ "^ live for no ([A-Z] {5}), die for some \ 1 $ ");
If (R. ismatch (x ))
{
Console. writeline ("group1 value:" + R. Match (x). Groups [1]. Value); // output: thing
}
// Obtain the content in the group. Note: This is groups [1], because groups [0] is the entire matching string, that is, the content of the entire variable X.

String x = "Live for nothing, die for something ";
RegEx r = new RegEx (@ "^ live for no (? <G1> [A-Z] {5}), die for some \ 1 $ ");
If (R. ismatch (x ))
{
Console. writeline ("group1 value:" + R. Match (x). Groups ["g1"]. Value); // output: thing
}
// Indexes can be performed based on the group name. Use the following format to identify a group name (? <Groupname> ...).

String x = "Live for nothing ";
RegEx r = new RegEx (@ "([A-Z] +) \ 1 ");
If (R. ismatch (x ))
{
X = R. Replace (x, "$1 ");
Console. writeline ("Var X:" + x); // output: Live for nothing
}
// Delete the repeated "nothing" in the original string ". In addition to the expression, use "$1" to reference the first group. The group name is used for reference below:
String x = "Live for nothing ";
RegEx r = new RegEx (@"(? <G1> [A-Z] +) \ 1 ");
If (R. ismatch (x ))
{
X = R. Replace (x, "$ {G1 }");
Console. writeline ("Var X:" + x); // output: Live for nothing
}

String x = "Live for nothing ";
RegEx r = new RegEx (@ "^ live for no (? : [A-Z] {5}) $ ");
If (R. ismatch (x ))
{
Console. writeline ("group1 value:" + R. Match (x). Groups [1]. Value); // output :( null)
}
// Add "? : "Indicates that this is a" non-capturing group ", that is, the engine will not save the content of this group.

(8) greedy and non-greedy
The engine of the regular expression is greedy. As long as the mode permits, it will match as many characters as possible. Add "?" after "repeated description characters" (*, +), You can change the matching mode to non-greedy. See the following example:

Code
String x = "Live for nothing, die for something ";
RegEx R1 = new RegEx (@ ". * thing ");
If (r1.ismatch (x ))
{
Console. writeline ("Match:" + r1.match (x). Value); // output: Live for nothing, die for something
}
RegEx r2 = new RegEx (@".*? Thing ");
If (r2.ismatch (x ))
{
Console. writeline ("Match:" + r2.match (x). Value); // output: Live for nothing
}

(9) backtracking and non-backtracking
Use "(?> ...)" Method. Due to the greedy nature of the Regular Expression Engine, in some cases, it will be traced back for matching. See the following example:

Code
String x = "Live for nothing, die for something ";
RegEx R1 = new RegEx (@ ". * thing ,");
If (r1.ismatch (x ))
{
Console. writeline ("Match:" + r1.match (x). Value); // output: Live for nothing,
}
RegEx r2 = new RegEx (@ "(?>. *) Thing ,");
If (r2.ismatch (x) // Mismatch
{
Console. writeline ("Match:" + r2.match (x). value );
}
// In R1, ". * "because of its greedy feature, it will always match the end of the string, and then match" thing ", but fails when", ". In this case, the engine will trace back and,.
In R2, the entire expression fails to be matched due to forced non-backtracking.

(10) forward and reverse pre-Search
Forward pre-search declaration format: positive declaration "(? = ...)", Negative statement "(?!...)" The Declaration itself is not part of the final matching result. Please refer to the following example:

Code
String x = "1024 used 2048 free ";
RegEx R1 = new RegEx (@ "\ D {4 }(? = Used )");
If (r1.matches (x). Count = 1)
{
Console. writeline ("R1 match:" + r1.match (x). Value); // output: 1024
}
RegEx r2 = new RegEx (@ "\ D {4 }(?! Used )");
If (r2.matches (x). Count = 1)
{
Console. writeline ("R2 match:" + r2.match (x). Value); // output: 2048
}
// The positive declaration in R1 indicates that the four digits must be followed by "used". The negative declaration in R2 indicates that the four digits cannot be followed by "used ".

Reverse pre-search declaration format: positive declaration "(? <=) ", Negative statement" (? <!)", The Declaration itself is not part of the final matching result. See the following example:

Code
String x = "used: 1024 free: 2048 ";
RegEx R1 = new RegEx (@"(? <= Used :) \ D {4 }");
If (r1.matches (x). Count = 1)
{
Console. writeline ("R1 match:" + r1.match (x). Value); // output: 1024
}
RegEx r2 = new RegEx (@"(? <! Used :) \ D {4 }");
If (r2.matches (x). Count = 1)
{
Console. writeline ("R2 match:" + r2.match (x). Value); // output: 2048
}
// The Reverse positive declaration in R1 indicates that the four digits must be followed by "used:". The reverse negative declaration in R2 indicates that the four digits must be followed by "used:.

(11) hexadecimal character range
In a regular expression, you can use "\ XXX" and "\ uxxxx" to indicate a character range ("X" indicates a hexadecimal number:
The characters whose \ XXX number is in the range of 0 to 255. For example, the space can be expressed as "\ x20.
\ Uxxxx any character can be expressed by "\ U" plus the 4-digit hexadecimal number of its number. For example, the Chinese character can be expressed by "[\ u4e00-\ u9fa5.

(12) relatively complete matching for [0,100]
The following is a comprehensive example. For matching [0,100], special considerations include:
* 00 legal, 00. Legal, 00.00 legal, 001.100 legal
* The Null String is invalid. Only the decimal point is invalid. The value greater than 100 is invalid.
* The value can be suffixed. For example, "1.07f" indicates that the value is of the float type (not considered)

Code
RegEx r = new RegEx (@ "^ \ +? 0 *(? : 100 (\. 0 *)? | (\ D {0, 2 }(? = \. \ D) | \ D {1, 2 }(? = ($ | \. $) (\. \ D *)?) $ ");
String x = "";
While (true)
{
X = console. Readline ();
If (X! = "Exit ")
{
If (R. ismatch (x ))
{
Console. writeline (x + "succeed! ");
}
Else
{
Console. writeline (x + "failed! ");
}
}
Else
{
Break;
}
}

(13) exact matching is sometimes difficult
In some cases, it is difficult to achieve exact matching, such as date, URL, and email address. In some cases, you even need to study some specialized documents to write accurate and complete expressions. In this case, you can only return to the next step to ensure exact matching. For example, you can consider a short period of time based on the actual situation of the application system, or for email-like matching, you can only consider the most common form.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More