Getting Started with C # regular expressions

Source: Internet
Author: User
Tags expression engine

Transferred from: http://www.cnblogs.com/KissKnife/archive/2008/03/23/1118423.html

Also recommended is an article on regular expressions: http://www.unibetter.com/deerchao/zhengzhe-biaodashi-jiaocheng-se.htm

(1) "@" symbol
"@" means that the string following it is a "verbatim string", not very well understood, for example, the two declarations are equivalent:
String x= "D:\\my huang\\my Doc";
String y = @ "D:\My huang\my Doc";
In fact, C # will error if declared as follows, because "\" is used in C # for escaping, such as "\ n" Wrapping:
string x = "D:\My huang\my Doc";

(2) basic grammatical characters.
Number of \d 0-9
\d \d complement (so that the word identifier complete, the same as the same), that is, all non-numeric characters
\w word character, refers to uppercase and lowercase letters, 0-9 digits, underscores
The complement of \w \w
\s whitespace characters, including line break \ n, carriage return \ r, tab \ T, vertical tab \v, page break \f
The complement of \s \s
. Any character except for the line break \ n
[...] Match all the characters listed in []
[^ ...] Match characters that are not listed in []
Some simple examples are provided below:


string i = "\ n";
String m = "3";
Regex r = new Regex (@ "\d");
With regex r = new Regex ("\\d");
R.ismatch (i) Result: true
R.ismatch (m) Results: false

string i = "%";
String m = "3";
Regex r = new Regex ("[a-z0-9]");
Match lowercase letters or numeric characters
R.ismatch (i) Result: false
R.ismatch (m) Results: True

(3) Positioning characters
The "anchor character" represents a virtual character, which represents a location, and you can intuitively assume that the "anchor character" represents the tiny gap between a character and a character.
^ indicates that subsequent characters must be at the beginning of the string
$ indicates that the preceding character must be at the end of the string
\b matches the boundary of a word
\b Matches a non-word boundary
Also included: \a the preceding character must be at the beginning of the character, \z the preceding character must be at the end of the string, \z the preceding character must be at the end of the string, or before the line break
Some simple examples are provided below:


string i = "Live for Nothing,die for something";
Regex r1 = new Regex ("^live for Nothing,die for something$");
R1. IsMatch (i) True
Regex r2 = new Regex ("^live for Nothing,die for some$");
R2. IsMatch (i) false
Regex r3 = new Regex ("^live for Nothing,die for some");
R3. IsMatch (i) True

string i = @ "Live for nothing,
Die for something ";//Multiple lines
Regex r1 = new Regex ("^live for Nothing,die for something$");
Console.WriteLine ("R1 Match count:" + R1. Matches (i). Count);//0
Regex r2 = new Regex ("^live for Nothing,die for something$", regexoptions.multiline);
Console.WriteLine ("R2 Match count:" + R2. Matches (i). Count);//0
Regex r3 = new Regex ("^live for Nothing,\r\ndie for something$");
Console.WriteLine ("R3 Match count:" + R3. Matches (i). Count);//1
Regex r4 = new Regex ("^live for nothing,$");
Console.WriteLine ("R4 match count:" + R4. Matches (i). Count);//0
Regex R5 = new Regex ("^live for nothing,$", regexoptions.multiline);
Console.WriteLine ("R5 Match count:" + R5. Matches (i). Count);//0
Regex r6 = new Regex ("^live for nothing,\r\n$");
Console.WriteLine ("R6 match count:" + R6. Matches (i). Count);//0
Regex R7 = new Regex ("^live for nothing,\r\n$", regexoptions.multiline);
Console.WriteLine ("R7 Match count:" + R7. Matches (i). Count);//0
Regex r8 = new Regex ("^live for nothing,\r$");
Console.WriteLine ("R8 Match count:" + R8. Matches (i). Count);//0
Regex R9 = new Regex ("^live for nothing,\r$", regexoptions.multiline);
Console.WriteLine ("R9 Match count:" + R9. Matches (i). Count);//1
Regex R10 = new Regex ("^die for something$");
Console.WriteLine ("R10 Match count:" + R10. Matches (i). Count);//0
Regex R11 = new Regex ("^die for something$", regexoptions.multiline);
Console.WriteLine ("R11 Match count:" + R11. Matches (i). Count);//1
Regex R12 = new Regex ("^");
Console.WriteLine ("R12 Match count:" + R12. Matches (i). Count);//1
Regex R13 = new Regex ("$");
Console.WriteLine ("R13 Match count:" + R13. Matches (i). Count);//1
Regex R14 = new Regex ("^", regexoptions.multiline);
Console.WriteLine ("R14 Match count:" + R14. Matches (i). Count);//2
Regex R15 = new Regex ("$", regexoptions.multiline);
Console.WriteLine ("R15 Match count:" + R15. Matches (i). Count);//2
Regex R16 = new Regex ("^live for Nothing,\r$\n^die for something$", regexoptions.multiline);
Console.WriteLine ("R16 match count:" + R16. Matches (i). Count);//1
For a multiline string, after you set the multiline option, the ^ and $ will appear multiple matches.

string i = "Live for Nothing,die for something";
string m = "Live for Nothing,die for some thing";
Regex r1 = new Regex (@ "\bthing\b");
Console.WriteLine ("R1 Match count:" + R1. Matches (i). Count);//0
Regex r2 = new Regex (@ "thing\b");
Console.WriteLine ("R2 Match count:" + R2. Matches (i). Count);//2
Regex r3 = new Regex (@ "\bthing\b");
Console.WriteLine ("R3 Match count:" + R3. Matches (M). Count);//1
Regex r4 = new Regex (@ "\bfor something\b");
Console.WriteLine ("R4 match count:" + R4. Matches (i). Count);//1
\b is usually used to constrain a complete word

  (4) Repeating description character
"Repeating description character" is one of the places where C # regular expressions are "very good and powerful":
{n}  matches the preceding character n times
{n,}  matches the preceding character n times or more than n times
{n,m}  matches the preceding character N to M times
?  matches the preceding character 0 or 1 times
+  matches the preceding character 1 or more 1 times
*  Match the preceding character 0 times or 0 times
The following provides some simple examples:


string x = "1024";
String y = "+1024";
String z = "1,024";
String a = "1";
String b= "-1024";
String c = "10000";
Regex r = new Regex (@ "^\+?[ 1-9],?\d{3}$ ");
Console.WriteLine ("X Match count:" + r.matches (x). Count);//1
Console.WriteLine ("Y match count:" + r.matches (y). Count);//1
Console.WriteLine ("Z Match count:" + r.matches (z).) Count);//1
Console.WriteLine ("A match count:" + r.matches (a). Count);//0
Console.WriteLine ("B Match count:" + r.matches (b). Count);//0
Console.WriteLine ("C Match count:" + r.matches (c). Count);//0
Matches an integer from 1000 to 9999.

(5) Select a match
The (|) symbol in C # Regular expressions does not seem to have a special title, let's call it "choose a match". In fact, a [a-z] is also a choice match, except that it only matches a single character, and (|) is provided with a larger range, (AB|XY) indicates matching ab or matching xy. Note the "|" and "()" Here is a whole. Some simple examples are provided below:


string x = "0";
String y = "0.23";
String z = "100";
String a = "100.01";
String b = "9.9";
String c = "99.9";
String d = "99.";
String e = "00.1";
Regex r = new Regex (@ "^\+?" (100 (. 0+) *) | ([1-9]? [0-9]) (\.\d+) *) $ ");
Console.WriteLine ("X Match count:" + r.matches (x). Count);//1
Console.WriteLine ("Y match count:" + r.matches (y). Count);//1
Console.WriteLine ("Z Match count:" + r.matches (z).) Count);//1
Console.WriteLine ("A match count:" + r.matches (a). Count);//0
Console.WriteLine ("B Match count:" + r.matches (b). Count);//1
Console.WriteLine ("C Match count:" + r.matches (c). Count);//1
Console.WriteLine ("D Match count:" + r.matches (d). Count);//0
Console.WriteLine ("E Match count:" + r.matches (E). Count);//0
Matches the number 0 to 100. The outermost brackets contain two parts "(100 (. 0+) *)", "([1-9]?[ 0-9]) (\.\d+) * ", these two parts are" OR "relationships, that is, the regular expression engine tries to match 100 first, and if it fails, attempts to match the latter expression (representing the number in the [0,100) range).

(6) Matching of special characters
Some simple examples are provided below:


string x = "\ \";
Regex r1 = new Regex ("^\\\\$");
Console.WriteLine ("R1 Match count:" + R1. Matches (x). Count);//1
Regex r2 = new Regex (@ "^\\$");
Console.WriteLine ("R2 Match count:" + R2. Matches (x). Count);//1
Regex r3 = new Regex ("^\\$");
Console.WriteLine ("R3 Match count:" + R3. Matches (x). Count);//0
Match "\"

string x = "\" ";
Regex r1 = new Regex ("^\" $ ");
Console.WriteLine ("R1 Match count:" + R1. Matches (x). Count);//1
Regex r2 = new Regex (@ "^" "$");
Console.WriteLine ("R2 Match count:" + R2. Matches (x). Count);//1
Match double quotes

(7) Group and non-capturing group
Here are a few simple examples:


string x = "Live for Nothing,die for something";
String y = "Live for Nothing,die for Somebody";
Regex r = new Regex (@ "^live ([A-z]{3}) No ([a-z]{5}), die \1 some\2$");
Console.WriteLine ("X Match count:" + r.matches (x). Count);//1
Console.WriteLine ("Y match count:" + r.matches (y). Count);//0
The regular expression engine remembers what is matched in "()" as a "group" and can be referenced by means of an index. "\1" in the expression that is used to reverse the first group that appears in the expression, that is, the first parenthesis content of the bold identifier, and so on, "\2".

string x = "Live for Nothing,die for something";
Regex r = new Regex (@ "^live for No ([a-z]{5}), die for some\1$");
if (R.ismatch (x))
{
Console.WriteLine ("Group1 value:" + r.match (x). GROUPS[1]. Value);//output: Thing
}
Gets the contents of the group. Note that this is groups[1], because Groups[0] is the entire matching string, that is, the contents of the entire variable x.

string x = "Live for Nothing,die for something";
Regex r = new Regex (@ "^live for no (? <g1>[a-z]{5}), die for some\1$");
if (R.ismatch (x))
{
Console.WriteLine ("Group1 value:" + r.match (x). groups["G1"]. Value);//output: Thing
}
can be indexed according to the group name. Use the following format to identify the name of a group (? <groupname> ...).

string x = "Live for Nothing";
Regex r = new Regex (@ "([a-z]+) \1");
if (R.ismatch (x))
{
x = R.replace (x, "$");
Console.WriteLine ("var x:" + x);//output: Live for nothing
}
Delete the duplicate "Nothing" in the original string. In addition to the expression, use "$" to refer to the first group, which is referenced by a group name:
string x = "Live for Nothing";
Regex r = new Regex (@ "(? <g1>[a-z]+) \1");
if (R.ismatch (x))
{
x = R.replace (x, "${g1}");
Console.WriteLine ("var x:" + x);//output: Live for nothing
}

string x = "Live for Nothing";
Regex r = new Regex (@ "^live for No (?: [A-z]{5}) $");
if (R.ismatch (x))
{
Console.WriteLine ("Group1 value:" + r.match (x). GROUPS[1]. Value);//output: (empty)
}
Adding "?:" to the group indicates that this is a "non-capturing group", that is, the engine will not save the contents of the group.

(8) Greed and non-greed
The engine of the regular expression is greedy, and as long as the pattern allows, it will match as many characters as possible. You can change the matching pattern to non-greedy by adding "?" after "Repeat description character" (*,+). Take a look at the following example:


string x = "Live for Nothing,die for something";
Regex r1 = new Regex (@ ". *thing");
if (R1. IsMatch (x))
{
Console.WriteLine ("Match:" + R1. Match (x). Value);
Output: Live for Nothing,die for something
}
Regex r2 = new Regex (@ ". *?thing");
if (R2. IsMatch (x))
{
Console.WriteLine ("Match:" + R2. Match (x). Value);
Output: Live for nothing
}

(9) Backtracking and non-backtracking
Use "(...)" Non-retrospective declaration of the method. Because of the greedy nature of the regular expression engine, which in some cases causes it to backtrack to get a match, consider the following example:


string x = "Live for Nothing,die for something";
Regex r1 = new Regex (@ ". *thing,");
if (R1. IsMatch (x))
{
Console.WriteLine ("Match:" + R1. Match (x). Value);//output: Live for nothing,
}
Regex r2 = new Regex (@ "(? >.*) thing,");
if (R2. IsMatch (x))//mismatch
{
Console.WriteLine ("Match:" + R2. Match (x). Value);
}
In R1, ". *" because of its greedy nature, will always match to the end of the string, followed by the "thing", but the match "," when the failure, the engine will backtrack, and in "thing," where the match succeeds.
In R2, the entire expression match failed due to forced non-backtracking.

(10) Forward pre-search, reverse pre-search
Forward pre-Search declaration format: Positive declaration "(? = ...)", Negative declaration "(?! ) ", the declaration itself is not part of the final matching result, see the following example:

Reverse pre-Search declaration format: Positive declaration "(? <=)", Negative Declaration "(? <!)", the declaration itself is not part of the final matching result, see the following example:


string x = "1024x768 used 2048 free";
Regex r1 = new Regex (@ "\d{4} (? = used)");
if (R1. Matches (x). Count==1)
{
Console.WriteLine ("R1 match:" + R1. Match (x). Value);//output: 1024
}
Regex r2 = new Regex (@ "\d{4} (?! used)");
if (R2. Matches (x). Count==1)
{
Console.WriteLine ("R2 match:" + R2. Match (x). Value); Output: 2048
}
A positive declaration in R1 indicates that the "used" must be followed immediately after the four-digit number, and that a negative declaration in R2 indicates that a four-digit number cannot be followed by a "used".

(11) Hexadecimal character range
In regular expressions, you can use "\xxx" and "\uxxxx" to denote one character ("X" means a hexadecimal number) as a character range:
\xxx characters with a range of 0 to 255, for example: spaces can be represented by "\x20".
\uxxxx any character can be represented by using "\u" plus its numbered 4-digit hexadecimal number, for example: Kanji can be represented using "[\u4e00-\u9fa5]".


(12) A more complete match for [0,100]
The following is a more comprehensive example of a match [0,100] where special considerations are required including
*00 Legal, 00. Legal, 00.00 Legal, 001.100 legal
* Empty string illegal, only the decimal point is not legal, more than 100 illegal
* values are suffixed, such as "1.07f" means that the value is a float type (not considered)


Regex r = new Regex (@ "^\+?0* (?: 100 (\.0*)? | (\d{0,2} (? =\.\d) |\d{1,2} (? = ($|\.$))) (\.\d*)?) $");
string x = "";
while (true)
{
x = Console.ReadLine ();
if (x! = "Exit")
{
if (R.ismatch (x))
{
Console.WriteLine (x + "succeed!");
}
Else
{
Console.WriteLine (x + "failed!");
}
}
Else
{
Break
}
}

(13) Exact matching is sometimes difficult.
Some requirements to achieve accurate matching is difficult, such as: date, URL, email address, and some of them you even need to study some special documents to write accurate and complete expression, for this situation, can only be returned to the second, to ensure a more accurate match. For example, for a date, you can consider a short period of time based on the actual application system, or for a match like email, you can consider only the most common form.

Getting Started with C # regular expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.