Summary of common metacharacters in Regular Expressions

Source: Internet
Author: User

The Regular Expression Language consists of two basic character types: literal (normal) text characters and metacharacters. Metacharacters enable the regular expression to process. Metacharacters can be any single character in [] (for example, [a] indicates matching a single lowercase character ), it can also be a character sequence (for example, [a-d] indicates matching any character between a, B, c, and d, while \ w indicates any English letter, number, and underline ), below are some common metacharacters:

. Match any character except \ n (note that the metacharacter is a decimal point ).
[Abcde] matches any character in abcde.
[A-h] matches any character between a and h.
[^ Fgh] does not match any character in fgh
\ W matches any one of the upper and lower case English characters and numbers 0 to 9 and underline, equivalent to [a-zA-Z0-9 _]
\ W does not match any one of the upper and lower case English characters and numbers 0 to 9, equivalent to [^ a-zA-Z0-9 _]
\ S matches any blank characters, which is equivalent to [\ f \ n \ r \ t \ v]
\ S matches any non-blank characters, which is equivalent to [^ \ s]
\ D matches a single number between 0 and 9, which is equivalent to [0-9]
\ D does not match any single number between 0 and 9, which is equivalent to [^ 0-9]
[\ U4e00-\ u9fa5] matches any single Chinese character (here it uses Unicode encoding to represent Chinese characters)
Regular Expression qualifier
The above metacharacters are matched for a single character. To match multiple characters at the same time, you also need to use a qualifier. Below are some common delimiters (n and m in the following table both represent integers and 0 <n <m ):
* Matches 0 to multiple metacharacters, equivalent to {0 ,}
? Matches 0 to 1 metacharacters, equivalent to {0, 1}
{N} matches n metacharacters
{N,} matches at least n metacharacters
{N, m} matches n to m metacharacters
+ Match at least 1 metacharacters, equivalent to {1 ,}
\ B match word boundary
^ The string must start with a specified character
$ The string must end with a specified character

Note:
(1) because the regular expressions include "\" and "? "," * "," ^ "," $ "," + "," (",") "," | "," {"," [", And other characters have some special significance. If you need to use their original meanings, escape them, for example, if you want to have at least one "\" in the string, the regular expression should be written as follows: \ +.
(2) You can enclose multiple metacharacters or literal text characters in parentheses to form a group, such as ^ (13) [4-9] \ d {8} $ indicates any mobile phone number starting with 13.
(3) In addition, Chinese characters are matched using the corresponding Unicode encoding. For a single Unicode character, for example, \ u4e00 indicates the Chinese character "1 ", \ u9fa5 indicates the Chinese character "second". In Unicode encoding, this is the first and last Unicode encoding of the Chinese characters that can be expressed. In Unicode encoding, this can represent 20901 Chinese characters.
(4) For the usage of \ B, it indicates the start or end of a word. The string "123a 345b 456 789d" is used as the sample string, if the regular expression is "\ B \ d {3} \ B", it can only match 456.
(5) "|" can be used to represent or. For example, [z | j | q] indicates matching any letter in z, j, and q.

Expression Match
/^ \ S * $/ Matches empty rows.
/\ D {2}-\ d {5 }/ The ID number consists of two digits, one hyphen and five digits.
/<\ S * (\ S +) (\ s [^>] *)?> [\ S \ S] * <\ s * \/\ 1 \ s *>/ Matches HTML tags.

The following table contains a complete list of metacharacters and their behavior in the context of a regular expression:

Character Description
\ Mark the next character as a special character, text, reverse reference, or octal escape character. For example, "n" matches the character "n ". "\ N" matches the line break. The sequence "\" matches "\", and "\ (" matches "(".
^ Match the start position of the input string. IfRegExpObjectMultilineProperty, ^ also matches the position after "\ n" or "\ r.
$ Matches the position at the end of the input string. IfRegExpObjectMultilineAttribute, $ also matches the position before "\ n" or "\ r.
* Matches the previous character or subexpression zero or multiple times. For example, zo * matches "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous character or subexpression one or more times. For example, "zo +" matches "zo" and "zoo", but does not match "z. + Is equivalent to {1 ,}.
? Matches the previous character or subexpression zero or once. For example, "do (es )?" Match "do" in "do" or "does ".? It is equivalent to {0, 1 }.
{N} NIt is a non-negative integer. Exactly matchNTimes. For example, "o {2}" does not match "o" in "Bob", but matches two "o" in "food.
{N,} NIt is a non-negative integer. At least matchNTimes. For example, "o {2,}" does not match "o" in "Bob", but matches all o in "foooood. "O {1,}" is equivalent to "o + ". "O {0,}" is equivalent to "o *".
{N,M} MAndNIs a non-negative integer.N<=M. Match at leastNTimes, upMTimes. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note: you cannot insert spaces between commas and numbers.
? When this character is followed by any other qualifier (*, + ,? ,{N},{N,},{N,M}), The matching mode is "not greedy ". The "non-greedy" Mode matches the searched strings as short as possible, while the default "greedy" Mode matches the searched strings as long as possible. For example, in the string "oooo", "o + ?" Only one "o" is matched, and "o +" is matched with all "o ".
. Match any single character except "\ n. To match any character including "\ n", use a mode such as "[\ s \ S.
(Pattern) MatchPatternAnd capture the matched child expression. Available$0... $9The property is retrieved from the "match" set of the result. To match the parentheses (), use "\ (" or "\)".
(? :Pattern) MatchPatternHowever, the child expression that does not capture the match, that is, it is a non-capturing match and is not stored for future use. This is useful for components that use the "or" character (|) combination mode. For example, 'industr (? : Y | ies) is a more economical expression than 'industry | industries.
(? =Pattern) Execute the subexpression of Forward prediction first search, which matchesPatternThe start point of the string. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'matches "Windows" in "Windows 2000", but does not match "Windows" in "Windows 3.1 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
(?!Pattern) Execute the subexpression of the reverse prediction first search. This expression does not matchPatternThe start point of the string. It is a non-capture match, that is, it cannot be captured for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'matches "Windows" in "Windows 3.1", but does not match "Windows" in "Windows 2000 ". Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
X|Y MatchXOrY. For example, 'z | food' matches "z" or "food ". '(Z | f) ood' matches "zood" or "food ".
[Xyz] Character Set. Match any character. For example, "[abc]" matches "a" in "plain ".
[^Xyz] Reverse character set. Match any character that is not included. For example, "[^ abc]" matches "p" in "plain ".
[A-z] Character range. Matches any character in the specified range. For example, "[a-z]" matches any lowercase letter in the range of "a" to "z.
[^A-z] Reverse range character. Matches any character that is not within the specified range. For example, "[^ a-z]" matches any character that is not in the range of "a" to "z.
\ B Match A Word boundary, that is, the position between the word and the space. For example, "er \ B" matches "er" in "never", but does not match "er" in "verb ".
\ B Non-word boundary match. "Er \ B" matches "er" in "verb", but does not match "er" in "never ".
\ CX MatchXIndicates the control character. For example, \ cM matches Control-M or carriage return.XMust be between the A-Z or a-z. If this is not the case, it is assumed that c is the "c" character itself.
\ D Match numeric characters. It is equivalent to [0-9].
\ D Match non-numeric characters. It is equivalent to [^ 0-9].
\ F Match the page feed. It is equivalent to \ x0c and \ cL.
\ N Line feed match. It is equivalent to \ x0a and \ cJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v.
\ S Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v.
\ T Tab matching. It is equivalent to \ x09 and \ cI.
\ V Vertical tab matching. It is equivalent to \ x0b and \ cK.
\ W Matches any character type, including underscores. Equivalent to [A-Za-z0-9.
\ W Matches any non-word character. Equivalent to [^ A-Za-z0-9.
\ XN MatchN,NIs a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04" & "1. ASCII code can be used in regular expressions.
\Num MatchNum,NumIs a positive integer. To capture matched reverse references. For example, "(.) \ 1" matches two consecutive identical characters.
\N Identifies an octal escape code or a reverse reference. If \NAt leastNCapture sub-expressions, thenNIs a reverse reference. Otherwise, ifNIs the eight-digit number (0-7), thenNIt is an octal escape code.
\Nm Identifies an octal escape code or a reverse reference. If \NmAt leastNmCapture sub-expressions, thenNmIs a reverse reference. If \NmAt leastNCaptureNIs reverse reference, followed by charactersM. If neither of the preceding conditions exists, then \NmMatching octal valuesNm, WhereNAndMIt is an octal digit (0-7 ).
\ Nml WhenNIs the number of octal (0-3 ),MAndLMatch the octal escape code when it is an octal number (0-7 ).Nml.
\ UN MatchN, WhereNIt is a Unicode Character in hexadecimal notation. For example, \ u00A9 matches the copyright symbol (©).

User Name

/^ [A-z0-9 _-] {} $/

Password

/^ [A-z0-9 _-] {6, 18} $/

Hexadecimal value

/^ #? ([A-f0-9] {6} | [a-f0-9] {3}) $/

Email

/^ ([Wd _.-] +) @ ([wd _-] +.) + w {2, 4} $/

/^ ([A-z0-9 _.-] +) @ ([da-z.-] +). ([a-z.] {2, 6}) $/

/^ [A-zd] + (. [a-zd] +) * @ ([da-z] (-[da-z])?) + (. {1, 2} [a-z] +) + $/

URL

/^ (Https? ://)? ([Da-z.-] +). ([a-z.] {2, 6}) ([/w.-] *) */? $/

/^ (Https? ://)? ([Wd _-] +.) + w {2, 4} (/[wd .? -_ % = &] +) * $/

IP address

/(2 [0-4] d | 25 [0-5] | [01]? Dd ?).) {3} (2 [0-4] d | 25 [0-5] | [01]? Dd ?) /

Or

/^ (? :(? : 25 [0-5] | 2 [0-4] [0-9] | [01]? [0-9] [0-9]?). {3 }(? : 25 [0-5] | 2 [0-4] [0-9] | [01]? [0-9] [0-9]?) $/

HTML Tag

/^ <([A-z] +) ([^ <] + )*(? :> (. *) </1> | s +/>) $/

References:
Http://msdn.microsoft.com/zh-cn/library/ae5bf541 (VS.80). aspx

Http://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.