C # Tutorial C # Regular expressions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

C # Regular Expressions

A regular expression is a pattern that matches the input text. The. Net Framework provides a regular expression engine that allows this match. A pattern consists of one or more characters, operators, and structures.

Defining Regular Expressions

The following lists the various categories of characters, operators, and structures used to define regular expressions.

Character escapes

Character class

Anchor Point

Grouping constructs

Qualifier

Reverse reference Construction

Alternate construction

Replace

Miscellaneous constructs

Character escapes

The backslash character (\) in a regular expression indicates that the character followed by is a special character, or that the character should be interpreted as literal.

The following table lists the escape characters:

Escape character

Describe

Mode

The

The \a matches the alarm (Bell) character \u0007. "\u0007" in \a "warning!" + ' \u0007 '

\b In the character class, and the backspace bar \u0008 match. [\b] {3,} "\b\b\b\b" in "\b\b\b\b"

\ t matches the tab \u0009. (\w+) \ t "name\t" and "addr\t" in "name\taddr\t"

\ r matches the carriage return \u000d. (\ r is not equivalent to line break \ n.) \ r \ n (\w+) "\r\hello\nworld." In "\r\nhello"

The \v matches the Vertical tab \u000b. [\v] {2,} "\v\v\v" in "\v\v\v"

\f matches the page break \u000c. [\f] {2,} "\f\f\f" in "\f\f\f"

\ n matches line break \u000a. \ r \ n (\w+) "\r\hello\nworld." In "\r\nhello"

The \e matches the escape character \u001b. \e "\x001b" in "\x001b"

\ nnn Specifies a character using the octal representation (NNN is made up of two to three digits). \w\040\w "a B" and "C D" in "a BC D"

\x NN specifies the character using the hexadecimal representation (nn happens to consist of two digits). \w\x20\w "a B" and "C D" in "a BC D"

\c x \c x matches x or x for the specified ASCII control character, where x or x is the letter of the control character. \CC "\x0003" in "\x0003" (ctrl-c)

\u nnnn uses a hexadecimal representation to match a Unicode character (four digits represented by nnnn). \w\u0020\w "a B" and "C D" in "a BC D"

\ matches the character when it is followed by an escaped character that does not recognize it. "\d+[\+-x\*]\d+\d+[\+-x\*\d+" and "3*9" in "(3*9) *"

Character class

The character class matches any one character in a set of characters.

The following table lists the character classes:

Character class

Describe

Mode

The

[Character_group] matches any single character in the Character_group. By default, matching is case-sensitive. [MN] "M" in "Mat", "M" and "N" in "Moon"

[^character_group] Non: matches any single character not in the Character_group. By default, characters in Character_group are case-sensitive. [^aei] "V" and "L" in "avail"

[First-last] Character range: matches any single character in the range from first to last. (\w+) \ t "name\t" and "addr\t" in "name\taddr\t"

. Wildcard: matches any single character except \ n.
To match the original period character (. or \u002e), you must precede the character with an escape character (\.). A.E "Ave" in "have", "ate" in "mate"

\p{name} matches any single character in the Unicode generic category or named block specified by name. \p{lu} "City Lights" in "C" and "L"

\p{name} matches any single character in a Unicode generic category or named block that is not specified in name. \p{lu} "City" in "I", "T" and "Y"

\w matches any word character. \w "R", "O", "M" and "1" in "room#1"

\w matches any non-word character. \w "#" in "Room#1"

The \s matches any whitespace character. \w\s "D" in "ID A1.3"

The \s matches any non-whitespace character. "_" in \s\s "int __ctr"

The \d matches any decimal number. \d "4" in "4 = IV"

\d matches any character that is not a decimal number. \d "", "=", "" "," I "and" V "in" 4 = IV "

Anchor Point

The anchor or atomic 0 width assertion causes the match to succeed or fail, depending on the current position in the string, but they do not cause the engine to advance or use characters in the string.

The following table lists the anchor points:

Assertion

Describe

Mode

The

^ The match must start at the beginning of a string or a line. ^\D{3} "567" in "567-777-"

The $ match must appear at the end of the string or before \ n at the end of the row or string. -\d{4}$ "-2012" in "8-12-2012"

The \a match must appear at the beginning of the string. \A\W{3} "Code" in "code-007-"

The \z match must appear at the end of the string or before \ n at the end of the string. -\d{3}\z "-007" in "bond-901-007"

The \z match must appear at the end of the string. -\d{3}\z "-333" in "901-333"

The \g match must appear where the last match ended. \\G\ (\d\) "(1) (3) (5) [7] (9)" (1) "," (3) "and" (5) "

\b matches must appear on the boundary between \w (alphanumeric) and \w (non-alphanumeric) characters. \w "R", "O", "M" and "1" in "room#1"

\b matches must not appear on the \b boundary. \bend\w*\b "Ends" and "Ender" in "End sends endure lender"

Grouping constructs

A grouping construct describes a subexpression of a regular expression, which is typically used to capture substrings of an input string.

The following table lists the grouping constructs:

Grouping constructs

Describe

Mode

The

(subexpression) captures the matched sub-expression and assigns it to a zero-based ordinal. (\w) \1 "ee" in "deep"

(?< name >subexpression) captures the matched sub-expression into a named group. (?< double>\w) \k< double> "Deep" in the "EE"

(?< name1-name2 >subexpression) defines the balance group definition. (((?' Open ' \ () [^\ (\)]*) + ((? ') Close-open ' \)) [^\ (\)]*) +) * (? ( Open) (?!)) $ "((1-3) * (3-1)" In "3+2^ ((1-3) * (3-1))"

(?: subexpression) defines a non-capturing group. Write (?: line)? "WriteLine" in "Console.WriteLine ()"

(? imnsx-imnsx:subexpression) to apply or disable the options specified in subexpression. A\d{2} (? i:\w+) \b "A12XL" and "A12XL" in "A12XL A12XL A12XL"

(? = subexpression) 0 width Positive lookahead assertion. \w+ (? =\.) "He is." The dog ran. "Is", "ran" and "out" in the sun are out.

(?! subexpression) 0 width negative lookahead assertion. \b (?! UN) \w+\b "sure" and "used" in "unsure sure unity used"

(?< =subexpression) 0 width is being recalled after the assertion is issued. (?< =19) \d{2}\b "2003" and "51" in "1851 1999 1950 1905 03"

(?<! subexpression) 0 Width negative review post assertion. (?<!19) \d{2}\b "endure" and "lender" in "End sends ends Ender"

(?> subexpression) a non-backtracking (also known as "greedy") sub-expression. [13579] (? >a+b+) "1ABB", "3ABB" and "5AB" in "1ABB 3ABBC 5AB 5AC"

Qualifier

The qualifier specifies how many instances of the previous element (which can be a character, group, or character class) must exist in the input string for a match to occur. Qualifiers include the language elements listed in the following table.

The following table lists the qualifiers:

Qualifier

Describe

Mode

The

* Match previous element 0 or more times. \d*\.\d ". 0", "19.9", "219.9"

+ matches the previous element one or more times. "Bee" in "be+" "Been", "be" in "bent"

? Matches the previous element 0 or one time. "Rai?n" "ran", "Rain"

{n} matches the previous element exactly n times. ", \d{3}" "1,043.6" in ", 043", "9,876,543,210" in ", 876", ", 543" and ", 210"

{N,} matches the previous element at least n times. "\d{2,}" "166", "29", "1930"

{n, m} matches the previous element at least n times, but not more than m times. "\d{3,5}" "166", "17668", "193024" in "19302"

*? Match the previous element 0 or more times, but as few times as possible. \d*?\.\d ". 0", "19.9", "219.9"

+? Matches the previous element one or more times, but as few times as possible. "Be+?" "Be" in "been", "be" in "bent"

?? Match the previous element 0 or one time, but as few times as possible. "Rai??" N "ran", "Rain"

{n}? Matches the leading element exactly n times. ", \d{3}?" "1,043.6" in ", 043", "9,876,543,210" in ", 876", ", 543" and ", 210"

{N,}? Matches the previous element at least n times, but as few times as possible. "\d{2,}?" "166", "29" and "1930"

{n, m}? Matches the previous element between N and M, but as few times as possible. "\d{3,5}?" "166", "17668", "193024" in "193" and "024"

Reverse reference Construction

A reverse reference allows you to subsequently identify a previously matched subexpression in the same regular expression.

The following table lists the reverse reference constructs:

Reverse reference Construction

Describe

Mode

The

\ number Reverse Reference. Matches the value of the number subexpression. (\w) \1 the "EE" in "seek"

\k< name > name Reverse Reference. Matches the value of a named expression. (?< char>\w) \k< char> "seek" in the "EE"

Alternate construction

Alternate constructs are used to modify regular expressions to enable either/or matching.

The following table lists the alternative constructs:

Alternate construction

Describe

Mode

The

| Matches any one element separated by a vertical bar (|) character. Th (E|is|at) "The" and "this" in "the" and "this"

(? (expression) yes | NO) matches yes if the regular expression pattern is specified by expression match, otherwise matches the optional no part. expression is interpreted as a zero-width assertion. (? A a\d{2}\b|\b\d{3}\b) "A10" and "910" in "A10 C103 910"

(? (name) yes | NO) matches yes if name or named or numbered capturing group has a match, otherwise matches optional No. (?< quoted> ")? (? (quoted). +? "| \s+\s) "Dogs.jpg" Yiska playing.jpg "" Dogs.jpg and "Yiska playing.jpg"

Replace

Replace is the regular expression used in the replace pattern.

The following table lists the characters that are used for substitution:

Character

Describe

Mode

Replacement mode

Input string

Result string

$number replace substrings matched by group number. \b (\w+) (\s) (\w+) \b $3$2$1 "one-and-one"

${name} replaces the substring that is matched by the named group name. \b (?< word1>\w+) (\s) (?< word2>\w+) \b ${word2} ${word1} "One and one" "one"

$$ Replace the character "$". \b (\d+) \s? USD $$$1 "103 USD" "$103"

$& replaces a copy of the entire match. (\$* (\d* (\.+\d+)?) {1}) **$& "$1.30" "**$1.30**"

$ ' Replaces all text of the input string before matching. B + $ ' "AABBCC" "AAAACC"

$ ' Replaces all text after matching the input string. B + $ ' "AABBCC" "AACCCC"

$+ replaces the last captured group. B + (c+) $+ "AABBCCDD" AACCDD

$_ Replace the entire input string. B + $_ "AABBCC" "AAAABBCCCC"

Miscellaneous constructs

The following table lists various miscellaneous constructs:

Structure

Describe

Instance

(? imnsx-imnsx) to set or disable options such as case insensitivity in the middle of the pattern. \ba (? i) b\w+\b matches "ABA" and "Able" in "ABA Able Act"

(? #comment) inline comments. The comment terminates at the first closing parenthesis. \ba (? #Matches words starting with A) \w+\b

# [to end of line] X mode comment. The comment begins with a non-escaped # and continues to the end of the line. (? x) \ba\w+\b#matches words starting with A

Regex class

The Regex class is used to represent a regular expression.

The following table lists some common methods in the Regex class:

Serial number

Method & Description

1 public bool IsMatch (string input)
Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string.

2 public bool IsMatch (string input, int startat)
Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the specified starting position in the string.

3 public static bool IsMatch (string input, string pattern)
Indicates whether the specified regular expression finds a match in the specified input string.

4 Public matchcollection Matches (string input)
Searches for all occurrences of a regular expression in the specified input string.

5 public string Replace (string input, string replacement)
Replaces all matched strings that match the regular expression pattern with the specified replacement string in the specified input string.

6 Public string[] Split (String input)
Splits the input string into a substring array, based on the location defined by the regular expression pattern specified in the Regex constructor.

For a complete list of properties for the Regex class, see the Microsoft C # documentation.

Example 1

The following example matches a word that begins with ' S ':

Using system;using system.text.regularexpressions;namespace regexapplication{   class program   {      Private static void Showmatch (string text, string expr)      {         Console.WriteLine ("The Expression:" + expr);         MatchCollection mc = regex.matches (text, expr);         foreach (Match m in MC)         {            Console.WriteLine (m);         }      }      static void Main (string[] args)      {         string str = "A thousand Splendid Suns";         Console.WriteLine ("Matching words that start with ' S ':");         Showmatch (str, @ "\bs\s*");         Console.readkey ();}}}

When the above code is compiled and executed, it produces the following results:

Matching words that start with ' S ': the Expression: \bs\s*splendidsuns

Example 2

The following example matches a word with ' E ' that begins with ' m ':

Using system;using system.text.regularexpressions;namespace regexapplication{   class program   {      Private static void Showmatch (string text, string expr)      {         Console.WriteLine ("The Expression:" + expr);         MatchCollection mc = regex.matches (text, expr);         foreach (Match m in MC)         {            Console.WriteLine (m);         }      }      static void Main (string[] args)      {         string str = "Make maze and manage to measure it";         Console.WriteLine ("Matching words start with ' m ' and ends with ' E ':");         Showmatch (str, @ "\bm\s*e\b");         Console.readkey ();}}}

When the above code is compiled and executed, it produces the following results:

Matching words start with ' m ' and ends with ' e ': the Expression: \bm\s*e\bmakemazemanagemeasure

Example 3

The following instance replaces the extra spaces:

Using system;using system.text.regularexpressions;namespace regexapplication{   class program   {      static void Main (string[] args)      {         string input = "Hello   world   ";         String pattern = "\\s+";         String replacement = "";         Regex rgx = new regex (pattern);         string result = Rgx. Replace (input, replacement);         Console.WriteLine ("Original String: {0}", input);         Console.WriteLine ("Replacement String: {0}", result);             Console.readkey ();}}}

When the above code is compiled and executed, it produces the following results:

Original String:hello   World   replacement String:hello World

The above is the "C # Tutorial" C # Regular expression content, more relevant content please follow topic.alibabacloud.com (www.php.cn)!



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More