Regular expression common meta-character finishing Summary _ regular expression

Source: Internet
Author: User
Tags control characters html tags numeric lowercase

Regular expression metacharacters and normal characters:
According to regular expression syntax rules, the matching pattern of regular expressions is composed of a series of characters.

I. Ordinary characters:

Most characters can only describe themselves, these characters are called ordinary characters, such as all letters and numbers.
That is, ordinary characters can only match characters in strings that are the same as them.
two. Meta character:
Because ordinary characters can only match characters that are the same as themselves, then the flexibility of regular expressions and the powerful matching function can not be fully demonstrated, so the regular expression also stipulates a series of special characters, these special characters are not based on the direct amount of characters to match, but with special semantics,

For example, the following characters are:

^ $ . * + ? = ! : | \ / ( ) [ ] { }

Although the above characters have special meanings, some characters have special meanings only in some context.
If you want to match these characters directly with special meaning, you need to escape them with a backslash (\) before them, for example, I want to match a $ direct amount, which needs to be written as \$, otherwise it matches a trailing position. It is because of the existence of these special characters that the regular expression has a powerful function.
They are called Meta characters because they are basic characters that construct various regular expressions that match complex text.

The usage of metacharacters will be covered in later chapters, just to introduce this concept. The regular expression language consists of two basic character types: literal (normal) text characters and metacharacters. Metacharacters make regular expressions capable of processing. Metacharacters can be any single character placed in [] (such as [a] that matches a single lowercase character a), it can also be a sequence of characters (such as [a-d] that matches any of the characters between A, B, C, and D, while the \w represents any letter and number and underscore), and the following are some common meta characters :

. matches any character except \ n (note that the meta character is a decimal point).
[ABCDE] matches any one of the characters in the ABCDE
[A-h] matches any character between A and H
[^FGH] does not match any one of the characters in the FGH
\w match uppercase and lowercase characters and numbers 0 to 9 of any and underscore, equivalent to [a-za-z0-9_]
\w does not match the size of the English character and the number 0 to 9 of any one, equivalent to [^a-za-z0-9_]
\s matches any whitespace character, equivalent to [\f\n\r\t\v]
\s matches any non-white-space character, equivalent to [^\s]
\d matches any single number from 0 to 9, equivalent to [0-9]
\d does not match any single number from 0 to 9, equivalent to [^0-9]
[\U4E00-\U9FA5] matches any single Chinese character (Unicode encoding is used here to denote Chinese characters)
Regular Expression Qualifier
The metacharacters above are all matched to a single character, and a qualifier is needed to match multiple characters at the same time. Here are some common qualifiers (n and M in the table below are all integers and 0<n<m):
* Match 0 to multiple metacharacters, equal to {0,}
? Matches 0 to 1 metacharacters, equivalent to {0,1}
{n} matches n-ary characters
{N,} matches at least n metacharacters
{N,m} matches n to M metacharacters
+ Match at least 1 characters, equal to {1,}
\b Match word boundaries
^ string must start with the specified character
The $ string must end with the specified character

Description
(1) because in the regular expression "\", "?" "," * "," ^ "," $ "," + "," (",") "," | "," {"," ["etc. characters already have a certain significance, if you need to use their original meaning, it should be escaped, for example, if you want to have at least one" \ "in the string, then the regular expression should write: \\+.
(2) Multiple meta characters or literal text characters can be enclosed in parentheses to form a grouping, such as ^ (4-9]\d{8}$), which represents any mobile phone number that starts with 13.
(3) In addition to the Chinese character matching is to use its corresponding Unicode encoding to match, for a single Unicode character, such as \u4e00 to express the Chinese character "one", \u9fa5 represents the Chinese character "龥", in the Unicode encoding this is the first and last of the characters can be expressed Unicode encoding, which can represent 20,901 characters in Unicode encoding.
(4) With regard to the use of \b, it represents the beginning or end of the word, the string "123a 345b 456 789d" As the example string, if the regular expression is "\b\d{3}\b", can only match 456.
(5) can use "| "To represent or relate to, for example, [z|j|q] that matches any one of the letters in Z, J, and Q.

An expression Match
/^\s*$/ Matches a blank line.
/\d{2}-\d{5}/ Verify the ID number consisting of two digits, one hyphen, and 5 digit digits.
/<\s* (\s+) (\s[^>]*)? >[\s\s]*<\s*\/\1\s*>/ Matches HTML tags.

The following table contains a complete list of metacharacters and their behavior in the context of regular expressions:

character Description
\ Marks the next character as a special character, text, reverse reference, or octal escape character. For example, "n" matches the character "n". "\ n" matches line breaks. The sequence "\" matches "\", "\" ("Match" ().
^ Matches the position where the input string starts. If the Multiline property of the RegExp object is set, ^ will also match the position after "\ n" or "\ r".
$ Matches the position of the end of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before "\ n" or "\ r".
* Matches the preceding character or subexpression 0 or more times. For example, zo* matches "z" and "Zoo". * is equivalent to {0,}.
+ Matches the preceding character or subexpression one or more times. For example, "zo+" matches "Zo" and "Zoo", but does not match "Z". + is equivalent to {1,}.
? Matches the preceding character or subexpression 0 times or once. For example, "Do (es)?" Match "Do" in "do" or "does". is equivalent to {0,1}.
{n} N is a non-negative integer. Matches n times exactly. For example, "o{2}" does not match "O" in "Bob", but matches two "o" in "food".
{n,} N is a non-negative integer. Match at least N times. For example, "o{2,}" does not match "O" in "Bob" and matches all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{n,m} m and n are nonnegative integers, where n <= m. Match at least N times, at most m times. For example, "o{1,3}" matches the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Note: You cannot insert a space between commas and numbers.
? When this character follows any other qualifier (*, + 、?、 {n}, {n,}, {n,m}), the matching pattern is "not greedy." The "not greedy" pattern matches the shortest possible string of searches, while the default "greedy" pattern matches the search for the longest string possible. For example, in the string "Oooo", "o+?" Matches only a single "O", and "o+" matches All "O".
. Matches any single character except "\ n". To match any character including "\ n", use a pattern such as "[\s\s]".
(pattern) Matches The pattern and captures the matching subexpression. You can use the $0...$9 property to retrieve a captured match from the result "match" collection. To match the bracket character (), use "\ (" or "\)".
(?:pattern) A subexpression that matches the pattern but does not capture the match, that is, it is a non capture match and does not store a match for later use. This is useful for combining a pattern part with an "or" character (|). For example, ' Industr (?: y|ies) is a more economical expression than ' industry|industries '.
(? =pattern) A subexpression that performs a forward lookahead search that matches the string at the starting point of the string that matches the pattern. It is a non capture match, that is, a match cannot be captured for later use. For example, ' Windows (? =95|98| nt|2000) ' matches windows in Windows 2000, but does not match windows in Windows 3.1. Lookahead does not occupy characters, that is, when a match occurs, the next matched search follows the previous match, not the word end-of-file that makes up the lookahead.
(?! pattern) A subexpression that performs a reverse lookahead search that matches a search string that is not at The starting point of a string that matches pattern. It is a non capture match, that is, a match cannot be captured for later use. For example, ' Windows (?! 95|98| nt|2000) ' matches windows in Windows 3.1, but does not match windows in Windows 2000. Lookahead does not occupy characters, that is, when a match occurs, the next matched search follows the previous match, not the word end-of-file that makes up the lookahead.
x| y Match x or y. For example, ' Z|food ' matches ' z ' or ' food '. ' (z|f) Ood ' matches ' zood ' or ' food '.
[XYZ] Character. Matches any one of the characters contained. For example, "[ABC]" matches "a" in "plain".
[^XYZ] The reverse character set. Matches any characters that are not included. For example, "[^abc]" matches "P" in "plain".
[A-Z] The range of characters. Matches any character within the specified range. For example, "[A-z]" matches any lowercase letter in the range "a" through "Z".
[^ A-Z] The reverse range character. Matches any character that is not in the specified range. For example, "[^a-z]" matches any character that is not in the range "a" through "Z".
\b Matches a word boundary, which is the position between the word and the space. For example, "er\b" matches "er" in "never", but does not match "er" in "verb".
\b Non-word boundary matching. "er\b" matches "er" in "verb", but does not match "er" in "Never".
\cx Matches the control characters indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be between A-Z or a-Z. If this is not the case, then C is assumed to be the "C" character itself.
\d numeric character matches. equivalent to [0-9].
\d Non-numeric character matching. is equivalent to [^0-9].
\f A page break match. Equivalent to \x0c and \CL.
\ n Line feed character matching. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any white space character, including spaces, tabs, page breaks, and so on. is equivalent to [\f\n\r\t\v].
\s Matches any non-white-space character. is equivalent to [^ \f\n\r\t\v].
\ t tab matching. is equivalent to \x09 and \ci.
\v Vertical tab matching. is equivalent to \x0b and \ck.
\w Matches any word class character, including underscores. is equivalent to "[a-za-z0-9_]".
\w Matches any non word character. is equivalent to "[^a-za-z0-9_]".
\xN Match N, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows ASCII code to be used in regular expressions.
\Num Matches num, where num is a positive integer. The reverse reference to the capture match. For example, "(.) \1 "matches two consecutive identical characters.
\N Identifies a octal escape code or a reverse reference. If n is preceded by at least N catch subexpression, then n is a reverse reference. Otherwise, if n is the octal number (0-7), then n is the octal escape code.
\nm Identifies a octal escape code or a reverse reference. nm is a reverse reference if there are at least nm capture subexpression in front of \nm . If there are at least n captures before \nm , n is a reverse reference followed by a character m. If neither of the preceding conditions exists, \nm matches the octal value nm, where n and m are octal digits (0-7).
\nml When N is a octal number (0-3), andm and l are octal numbers (0-7), the octal escape code NMLis matched.
\uN Matches n, where n is a Unicode character in four-bit hexadecimal numbers. For example, \u00a9 matches the copyright symbol (©).

User name

/^[a-z0-9_-]{3,16}$/

Password

/^[a-z0-9_-]{6,18}$/

Hexadecimal value

/^#? ([a-f0-9]{6}| [A-f0-9] {3}) $/

E-Mail

/^ ([wd_.-]+) @ ([wd_-]+.) +w{2,4}$/

/^ ([a-z0-9_.-]+) @ ([da-z.-]+). ([A-Z.] {2,6}) $/

/^[a-zd]+ (. [ a-zd]+) *@ ([da-z] (-[da-z)) +(. {1,2} [a-z]+) +$/

Url

/^ (https?:/ /)? ([da-z.-]+). ([A-Z.] {2,6}) ([/w.-]*) */?$/

/^ (https?:/ /)? ([wd_-]+.) +w{2,4} (/[wd.? -_%=&]+) *$/

IP Address

/((2[0-4]d|25[0-5]| [01]?dd?] {3} (2[0-4]d|25[0-5]| [01]?dd?] /

Or

/^(?:(? : 25[0-5]|2[0-4][0-9]| [01]? [0-9] [0-9]?). {3} (?: 25[0-5]|2[0-4][0-9]| [01]? [0-9] [0-9]?) $/

HTML tags

/^< ([a-z]+) ([^<]+) * (?:> (. *) </1>|s+/>) $/

Reference documents:
1,http://msdn.microsoft.com/zh-cn/library/ae5bf541 (vs.80). aspx

2,http://zh.wikipedia.org/wiki/%e6%ad%a3%e5%88%99%e8%a1%a8%e8%be%be%e5%bc%8f

-->

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.