C # Regular Expression Overview

Source: Internet
Author: User
Tags control characters

During the past two days, I have been working on data verification on the page. Many of them require regular expressions for verification. This is a headache. I have found a lot of information on the Internet and I will find some posts here, it will be used later.

The following table shows a complete list of metacharacters and their behaviors in the context of a regular expression:
Character Description
Mark the next character as a special character, a literal character, or a backward reference, or an octal escape character. For example, n matches the character "n ". Match A linefeed. Sequence \ matches "" and "(" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ also matches or is followed.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches or the previous position.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, zo + can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, o {2} cannot match the o in "Bob", but can match the two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, o {2,} cannot match o in "Bob", but can match all o in "foooood. O {1,} is equivalent to o +. O {0,} is equivalent to o *.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. O {0, 1} is equivalent to o ?. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", o ++? A single "o" will be matched, while o + will match all o.
. Match any single character. To match any character, use a pattern like.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated Matches set. The SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match parentheses, use (OR ).
(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, industr (? : Y | ies) is a simpler expression than industry | industries.
(? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, Windows (? = 95 | 98 | NT | 2000) can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, Windows (?! 95 | 98 | NT | 2000) can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X | y matches x or y. For example, z | food can match "z" or "food ". (Z | f) matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, [abc] can match a in "plain.
[^ Xyz] combination of negative character sets. Match any character not included. For example, [^ abc] can match p in "plain.
[A-z] character range. Matches any character in the specified range. For example, [a-z] can match any lowercase letter in the range of a to z.
[^ A-z] negative character range. Matches any character that is not within the specified range. For example, [^ a-z] can match any character that is not within the range of a to z.
Match A Word boundary, that is, the position between a word and a space. For example, er can match er in "never", but cannot match er in "verb.
B matches non-word boundaries. ErB can match the er in "verb", but cannot match the er in "never.
Cx matches the control characters specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as a literal c character.
D matches a numeric character. It is equivalent to [0-9].
D. match a non-numeric character. It is equivalent to [^ 0-9].
F matches a form feed. It is equivalent to x0c and cL.
Match A linefeed. It is equivalent to x0a and cJ.
Match a carriage return. It is equivalent to x0d and cM.
S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [fv].
S matches any non-blank characters. It is equivalent to [^ fv].
Match a tab. It is equivalent to x09 and cI.
V matches a vertical tab. It is equivalent to x0b and cK.
W matches any word characters that contain underscores. It is equivalent to [A-Za-z0-9 _].
W matches any non-word characters. It is equivalent to [^ A-Za-z0-9 _].
Xn matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, x41 matches "". X041 is equivalent to x04 & "1 ". The regular expression can use ASCII encoding ..
Um matches num, where num is a positive integer. References to the obtained matching. For example, (.) 1 matches two consecutive identical characters.
Identifies an octal escape value or a backward reference. If there are at least n obtained subexpressions, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
M identifies an octal escape value or a backward reference. If there are at least nm subexpressions before m, then nm is a backward reference. If at least n records are obtained before m, n is a backward reference followed by m. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), m will match the octal escape value nm.
If n is an octal digit (0-3) and m and l are octal numerals (0-7), the octal escape value nml is matched.
Un matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, u00A9 matches the copyright symbol (?).


The following are examples:
"^ The": indicates all strings starting with "The" ("There", "The cat", etc );
"Of despair $": indicates the string ending with "of despair;
"^ Abc $": indicates that the start and end of the string are "abc"-Haha, only "abc" itself;
"Notice": indicates any string containing "notice.

*, +, And? These three symbols indicate the number of repeated occurrences of one or more characters. They indicate "No or
"More", "one or more", and "none or one ". The following are examples:

"AB *": indicates that a string has one a followed by zero or several B. ("A", "AB", "abbb ",......);
"AB +": indicates that a string is followed by at least one B or more;
"AB? ": Indicates that a string has one a followed by zero or one B;
"? B + $ ": indicates that there are zero or one a followed by one or several B at the end of the string.

You can also use a range enclosed in braces to indicate the range of repeated times.

"AB {2}": indicates that a string has a followed by two B ("abb ");
"AB {2,}": indicates that a string contains at least two B strings;
"AB {3, 5}": indicates that a string has 3 to 5 B following.

Note that you must specify the lower limit of the range (for example, "{0, 2}" instead of "{, 2 }"). Also, you may have noticed that *, +, and
? It is equivalent to "{0,}", "{1,}", and "{0, 1 }".
There is another |, indicating "or" Operation:

"Hi | hello": indicates that a string contains "hi" or "hello ";
"(B | cd) ef": "bef" or "cdef ";
"(A | B) * c": represents a string of "a" "B" mixed strings followed by a "c ";

. Can replace any character:

"A. [0-9]": indicates that a string has a "a" followed by an arbitrary character and a number;
"^. {3 }$": represents a string of any three characters (Length: 3 characters );

Square brackets indicate that certain characters can appear at a specific position in a string:

"[AB]": indicates that a string has a "a" or "B" (equivalent to "a | B ");
"[A-d]": indicates that a string contains lowercase letters a

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.