Common Regular Expressions (SHARE)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A regular expression is a text mode consisting of common characters (such as characters a to z) and special characters (such as metacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string. For example:

JScript VBScript matching
/^ \ [\ T] * $/"^ \ [\ t] * $" matches a blank row.
/\ D {2}-\ d {5}/"\ d {2}-\ d {5}" verify that an ID number consists of two digits, A hyphen and a five-digit combination.
/<(. *)>. * <\/\ 1>/"<(. *)>. * <\/\ 1>" matches an HTML Tag.

The following table shows a complete list of metacharacters and their behaviors in the context of a regular expression:
Character Description
\ Mark the next character as a special character, an original character, or a backward reference, or an octal escape character. For example, 'n' matches the character "n ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after '\ n' or' \ R.
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before '\ n' or' \ R.
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "o", and 'O +' will match all 'O '.
. Match any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated Matches set. The SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use '\ (' or '\)'.
(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X | y matches x or y. For example, 'z | food' can match "z" or "food ". '(Z | f) ood' matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '.
[^ Xyz] combination of negative character sets. Match any character not included. For example, '[^ abc]' can match 'p' in "plain '.
[A-z] character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z.
[^ A-z] negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z.
\ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ Cx matches the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.
\ D matches a numeric character. It is equivalent to [0-9].
\ D matches a non-numeric character. It is equivalent to [^ 0-9].
\ F matches a break. It is equivalent to \ x0c and \ cL.
\ N matches a linefeed. It is equivalent to \ x0a and \ cJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cM.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ cI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ cK.
\ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ Xn matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..
\ Num matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\ N identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm.
\ Nml if n is an octal digit (0-3) and both m and l are octal digits (0-7), the octal escape value nml is matched.
\ Un matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (?).

The following are examples:
"^ The": indicates all strings starting with "The" ("There", "The cat", etc );
"Of despair $": indicates the string ending with "of despair;
"^ Abc $": indicates that the start and end of the string are "abc"-Haha, only "abc" itself;
"Notice": indicates any string containing "notice.

'*', '+' And '? 'These three symbols indicate the number of repeated occurrences of one or more characters. They indicate "No or
"More", "one or more", and "none or one ". The following are examples:

"AB *": indicates that a string has one a followed by zero or several B. ("A", "AB", "abbb ",......);
"AB +": indicates that a string is followed by at least one B or more;
"AB? ": Indicates that a string has one a followed by zero or one B;
"? B + $ ": indicates that there are zero or one a followed by one or several B at the end of the string.

You can also use a range enclosed in braces to indicate the range of repeated times.

"AB {2}": indicates that a string has a followed by two B ("abb ");
"AB {2,}": indicates that a string contains at least two B strings;
"AB {3, 5}": indicates that a string has 3 to 5 B following.

Note that you must specify the lower limit of the range (for example, "{0, 2}" instead of "{, 2 }"). Also, you may have noticed that '*', '+' and
'? 'Is equivalent to "{0,}", "{1,}", and "{0, 1 }".
There is also a '|', indicating "or" Operation:

"Hi | hello": indicates that a string contains "hi" or "hello ";
"(B | cd) ef": "bef" or "cdef ";
"(A | B) * c": represents a string of "a" "B" mixed strings followed by a "c ";

'.' Can replace any character:

"A. [0-9]": indicates that a string has a "a" followed by an arbitrary character and a number;
"^. {3 }$": represents a string of any three characters (Length: 3 characters );

Square brackets indicate that certain characters can appear at a specific position in a string:

"[AB]": indicates that a string has a "a" or "B" (equivalent to "a | B ");
"[A-d]": indicates that a string contains one of the lower-case 'A' to 'D' (equivalent to "a | B | c | d" or "[abcd]");
"^ [A-zA-Z]": a string that starts with a letter;
"[0-9] %": indicates a digit before the percent sign;
", [A-zA-Z0-9] $": represents a string ending with a comma followed by a letter or number.

You can also use '^' in square brackets to indicate unwanted characters. '^' should be the first character in square brackets. (For example, "% [^ a-zA-Z] %" table
The two percentage signs should not contain letters ).

For a word-by-word expression, it must be in "^. $ () | * +? {\ "Are preceded by the transfer character '\'.

Note that escape characters are not required in square brackets.

C # Regular Expression Summary

Only numbers are allowed: "^ [0-9] * $ ".
Only n digits can be entered: "^ \ d {n} $ ".
You can only enter at least n digits: "^ \ d {n,} $ ".
Only m ~ can be input ~ N-digit :. "^ \ D {m, n} $"
Only numbers starting with zero and non-zero can be entered: "^ (0 | [1-9] [0-9] *) $ ".
Only positive numbers with two decimal places can be entered: "^ [0-9] + (. [0-9] {2 })? $ ".
Only 1 ~ Positive number of three decimal places: "^ [0-9] + (. [0-9] {1, 3 })? $ ".
Only a non-zero positive integer can be entered: "^ \ +? [1-9] [0-9] * $ ".
Only a non-zero negative integer can be entered: "^ \-[1-9] [] 0-9" * $.
Only 3 characters can be entered: "^. {3} $ ".
You can only enter A string consisting of 26 English letters: "^ [A-Za-z] + $ ".
You can only enter a string consisting of 26 uppercase letters: "^ [A-Z] + $ ".
You can only enter a string consisting of 26 lower-case English letters: "^ [a-z] + $ ".
You can only enter a string consisting of a number and 26 English letters: "^ [A-Za-z0-9] + $ ".
You can only enter a string consisting of digits, 26 English letters, or underscores (_): "^ \ w + $ ".
Verify the User Password: "^ [a-zA-Z] \ w {5, 17} $". The correct format is: start with a letter, with a length of 6 ~ It can only contain characters, numbers, and underscores.
Check whether ^ % & ',; =? $ \ "And other characters:" [^ % & ',; =? $ \ X22] + ".
Only Chinese characters can be entered: "^ [\ u4e00-\ u9fa5] {0,} $"
Verify Email address: "^ \ w + ([-+.] \ w +) * @ \ w + ([-.] \ w + )*\. \ w + ([-.] \ w +) * $ ".
Verify InternetURL: "^ http: // ([\ w-] + \.) + [\ w-] + (/[\ w -./? % & =] *)? $ ".
Verification phone number: "^ (\ d {3, 4}-) | \ d {3.4 }-)? \ D {7,8} $ "correct format:" XXX-XXXXXXX "," XXXX-XXXXXXXX "," XXX-XXXXXXX "," XXX-XXXXXXXX "," XXXXXXX "and" XXXXXXXX ".
Verify the ID card number (15 or 18 digits): "^ \ d {15} | \ d {18} $ ".
12 months of verification: "^ (0? [1-9] | 1 [0-2]) $ "the correct format is:" 01 "~ "09" and "1 "~ "12 ".
31 days of verification for a month: "^ (0? [1-9]) | (1 | 2) [0-9]) | 30 | 31) $ "the correct format is;" 01 "~ "09" and "1 "~ "31 ".
Use regular expressions to restrict text box input in a webpage form:

You can only enter Chinese characters using regular expressions: onkeyup = "value = value. replace (/[^ \ u4E00-\ u9FA5]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ \ u4E00-\ u9FA5]/g ,''))"

You can only enter the full-width characters: onkeyup = "value = value. replace (/[^ \ uFF00-\ uFFFF]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ \ uFF00-\ uFFFF]/g ,''))"

Use a regular expression to limit that only numbers can be entered: onkeyup = "value = value. replace (/[^ \ d]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ \ d]/g ,''))"

You can only enter numbers and English letters using regular expressions: onkeyup = "value = value. replace (/[\ W]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ \ d]/g ,''))"

Javascript programs that extract file names from URLs using regular expressions. the following result is page1.

S = http://www.h2bbs.com
S = s. replace (/(. * \/) {0,} ([^ \.] +). */ig, "$2 ")
Alert (s)

Match double-byte characters (including Chinese characters): [^ \ x00-\ xff]

Application: Calculate the length of a string (two-byte length Meter 2, ASCII character meter 1)

String. prototype. len = function () {return this. replace ([^ \ x00-\ xff]/g, "aa"). length ;}

Regular Expression for matching empty rows: \ n [\ s |] * \ r

Regular Expressions matching HTML tags:/<(. *)>. * <\/\ 1> | <(. *) \/>/

Regular Expression matching spaces at the beginning and end: (^ \ s *) | (\ s * $)

String. prototype. trim = function ()
{
Return this. replace (/(^ \ s *) | (\ s * $)/g ,"");
}

Use regular expressions to break down and convert IP addresses:

The following is a Javascript program that uses regular expressions to match IP addresses and convert IP addresses to corresponding values:

Function IP2V (ip)
{
Re =/(\ d +) \. (\ d +)/g // Regular Expression matching IP addresses
If (re. test (ip ))
{
Return RegExp. $1 * Math. pow (255) + RegExp. $2 * Math. pow () + RegExp. $3 * + RegExp. $4*1
}
Else
{
Throw new Error ("Not a valid IP address! ")
}
}

However, if the above program does not use regular expressions, it may be easier to directly use the split function to separate them. The program is as follows:

Var ip = "10.100.0000168"
Ip = ip. split (".")
Alert ("the IP value is: "+ (ip [0] * 255*255*255 + ip [1] * 255*255 + ip [2] * 255 + ip [3] * 1 ))
Symbol explanation:

Character
Description

\
Mark the next character as a special character, a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character "n ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".

^
Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after '\ n' or' \ R.

$
Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before '\ n' or' \ R.

*
Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.

+
Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.

?
Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.

{N}
N is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.

{N ,}
N is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.

{N, m}
Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.

?
When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "o", and 'O +' will match all 'O '.

.
Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.

(Pattern)
Match pattern and obtain this match. The obtained match can be obtained from the generated Matches set. The SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use '\ (' or '\)'.

(? : Pattern)
Matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.

(? = Pattern)
Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.

(?! Pattern)
Negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | NT | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.

X | y
Match x or y. For example, 'z | food' can match "z" or "food ". '(Z | f) ood' matches "zood" or "food ".

[Xyz]
Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '.

[^ Xyz]
Negative value character set combination. Match any character not included. For example, '[^ abc]' can match 'p' in "plain '.

[A-z]
Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z.

[^ A-z]
Negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z.

\ B
Match A Word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.

\ B
Match non-word boundary. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.

\ Cx
Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character.

\ D
Match a numeric character. It is equivalent to [0-9].

\ D
Match a non-numeric character. It is equivalent to [^ 0-9].

\ F
Match a form feed. It is equivalent to \ x0c and \ cL.

\ N
Match A linefeed. It is equivalent to \ x0a and \ cJ.

\ R
Match a carriage return. It is equivalent to \ x0d and \ cM.

\ S
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].

\ S
Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].

\ T
Match a tab. It is equivalent to \ x09 and \ cI.

\ V
Match a vertical tab. It is equivalent to \ x0b and \ cK.

\ W
Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.

\ W
Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.

\ Xn
Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..

\ Num
Matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.

\ N
Identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.

\ Nm
Identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm.

\ Nml
If n is an octal number (0-3) and m and l are Octal numbers (0-7), the octal escape value nml is matched.

\ Un
Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (?).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More