The regular expression in JS:

Source: Internet
Author: User
Tags uppercase letter expression engine

1. Regular expression Rule 1 ordinary characters

Letters, numbers, Chinese characters, underscores, and punctuation marks that are not specifically defined in the following chapters are "normal characters."

Example 1: Expression "C", when matching the string "ABCDE", the result of the match is: success, matching to the content is: "C"; the match to the position is: start at 2, end at 3. (Note: The subscript starts at 0 or starts at 1, and may vary depending on the current programming language)

Example 2: The expression "BCD", when matching the string "ABCDE", the result of the match is: "BCD", matching to the position is: start at 1, end at 4.

1.2 Simple escape characters

An expression

Can match

/r,/n

Represents carriage return and line break

/t

Tabs

//

Represents "/" itself

/^

Match ^ symbol itself

/$

Matches the $ symbol itself

/.

Match the decimal point (.) itself

Example 1: expression "/$d", when matching the string "Abc$de", the result of the match is: success; match to: "$d"; Match to location

1.3 Expressions that match ' multiple characters '

Although it is possible to match any of these characters, it can only be one, not multiple.

An expression

Can match

/d

Any number, any one of the 0~9

/w

Any one letter or number or underscore, i.e. any of the a~z,a~z,0~9,_

/s

Any of the whitespace characters, including spaces, tabs, page breaks, and so on

.

The decimal point can match any character except the line break (/N)

Example 1: Expression "/d/d", when matching "abc123", the result of the match is: success; matches to: "12"; Match to Position: start at 3, end at 5.

Example 2: Expression "a./d", when matching "aaa100", the result of the match is: success; the match to is: "Aa1"; the matching position is: start at 1, end at 4.

1.4 Customizing expressions that match ' multiple characters '

Use square brackets [] to contain a series of characters that match any one of these characters. With [^] contains a series of characters, it is able to match any character other than the character. The same truth, although can match any one of them, but can only be one, not more.

An expression

Can match

[[email protected]]

Match "A" or "B" or "5" or "@"

[^ABC]

Matches any character except "A", "B", "C"

[F-k]

Matches any one of the letters between "F" ~ "K"

[^a-f0-3]

Matches any of the characters "a" ~ "F", "0" ~ "3"

Example 1: When the expression "[BCD][BCD]" matches "abc123", the result of the match is: success; match to: "BC"; match to position: start at 1, end at 3.

Example 2: When the expression "[^abc]" matches "abc123", the result of the match is: "1", matching to the position: starting at 3, ending at 4.

1.5 Special symbols for modifying the number of matches

If you use an expression plus a special symbol that modifies the number of matches, you can repeat the match without repeating the expression. Use the method: the number modifier is placed behind the decorated expression. For example: "[BCD][BCD]" can be written as "[Bcd]{2}".

An expression

Role

N

The expression repeats n times, for example: "/w{2}" equals "/w/w"; "A{5}" equals "AAAAA"

{M,n}

The expression repeats at least m times and repeats up to n times, for example: "ba{1,3}" can match "ba" or "Baa" or "baaa"

{m,}

The expression repeats at least m times, for example: "/w/d{2,}" can match "A12", "_456", "M12344" ...

?

Match expression 0 or 1 times, equivalent to {0,1}, for example: "A[CD]?" Can match "a", "AC", "AD"

+

Expression appears at least 1 times, equivalent to {1,}, for example: "A+b" can match "AB", "AaB", "Aaab" ...

*

expression does not appear or appear any time, equivalent to {0,}, for example: "/^*b" can match "B", "^^ ^b" ...

Example 1: Expression "/d+/."? /d* "in matching It costs $12.5", the result of the match was: "12.5", the match to the position is: start at 10, end at 14.

Example 2: When the expression "Go{2,8}gle" matches "Ads by Goooooogle", the result of the match is: "Goooooogle", the match to the position is: start at 7, end at 17.

1.6 Other special symbols that represent abstract meanings

Expression effect

^ matches where the string starts, does not match any characters

$ matches the place where the string ends, does not match any characters

/b matches a word boundary, which is the position between the word and the space, and does not match any characters

Example 1: expression "^aaa" when matching "xxx aaa xxx", the match result is: failed. Because "^" requires matching with the beginning of the string, the "^aaa" can be matched only when "AAA" is at the beginning of the string, for example: "AAA xxx xxx".

Example 2: Expression "aaa$" when matching "xxx aaa xxx", the match result is: failed. Because "$" requires a match to the end of the string, "aaa$" can be matched only when "AAA" is at the end of the string, for example: "xxx xxx aaa".

Example 3: expression "./b." When matching "@@ ZZFCTHOTFIXZ" , the match result is: success; match to: "@a"; match to: start at 2, end at 4.
Further explanation: "/b" is similar to "^" and "$", but it does not match any characters, but it requires it to be in the right and left side of the position of the match result, one side is the "/w" range, the other side is the range of "/w".

Example 4: When the expression "/bend/b" matches "Weekend,endfor,end ", the result of the match is: "End" and the match to the position: start at 15 and end at 18.

An expression Role
| The "or" relationship between the left and right sides of the expression
( )

(1). The expression in parentheses can be modified as a whole when the number of matches is modified.

(2). When matching results are obtained, the contents of the expressions in parentheses can be individually

Example 5: Expression "tom| When Jack "matches the string" I ' m Tom, he is Jack, the match is: success; the match is: "Tom"; the matching position is: start at 4, end at 7. When the next match is matched, the result is: "Jack", the match to the position: starting at 15, ending at 19.
Example 6: Expression "(go/s*) +" when matching "Let's Go Go go!", the match result is: success; match to content is: "Go Go Go"; The matching position is: start at 6, end at 14.

Example 7: Expression "¥" (/d+/.? /d*) "When matching" $10.9,¥20.5 ", the result of the match is: success; match to:" ¥20.5 "; match to position: start at 6, end at 10. The separately obtained parenthesis range matches to the content: "20.5".

2. Some advanced rules in regular expressions 2.1 greedy and non-greedy in the number of matches

For the text "Dxxxdxxxd", examples are as follows:

An expression

Match Results

(d) (/w+)

"/w+" will match all characters after the first "D" "Xxxdxxxd"

(d) (/w+) (d)

"/w+" will match all characters "xxxdxxx" between the first "D" and the Last "D". Although "/w+" can match the last "D", but in order for the entire expression to match successfully, "/w+" can "yield" the last "D" that it could have matched.

/w+ "When matched, always match as many characters as it matches. Although the second example does not match the last "D", it is also intended to allow the entire expression to match successfully. Similarly, expressions with "*" and "{M,n}" are as many matches as possible, and expressions with "?" are as close as possible to match when they can be mismatched. This matching principle is called "greedy" mode.

Here is the "non-greedy" mode:

When you add a "?" number after the special symbol of the matching number, you can match the mismatched expression as little as possible, making it possible to match an expression that does not match and "does not match". This matching principle is called "non-greedy" mode, also known as "reluctant" mode.

An expression

Match Results

(d) (/w+?)

"/w+?" will match as few characters as possible after the first "D", as a result: "/w+?" matches only one "X"

(d) (/w+?) (d)

In order for the entire expression to match successfully, "/w+" had to match "xxx" to allow the "D" to match, thus making the entire expression match successfully. So the result is: "/w+?" Match "XXX"

Example 1: Expression "<td> (. *) </td>" with string "<td><p>aa</p></td> <td><p>bb</p ></td> "match, the result of the match is: success; match to the content is" <td><p>aa</p></td> <td><p>bb</p ></td> "The entire string," </td> "in the expression will match the last" </td> "in the string.

Example 2: By contrast, expression "<td> (. *?) </td> "match example 1 in the same string, will only get" <td><p>aa</p></td> ", again match the next time, you can get a second" <td><p >bb</p></td> ".

2.2 Reverse Reference/1,/2 ...

When an expression matches, the expression engine records the string to which the expression that contains the parentheses "()" is matched. When you get a matching result, the parentheses that contain the expression match to the string that can be obtained separately. In fact, the parentheses contain the string that the expression matches to, not only after the match is over, but also in the matching process. After the expression, you can refer to the preceding "substring within the parentheses that matches the string". The reference method is "/" plus a number. "/1" refers to the 1th pair of parentheses within the matching string, "/2" refers to the 2nd pair of strings that match within parentheses ... And so on, if a pair of parentheses contains another pair of parentheses, the outer brackets are numbered first. In other words, which pair of left parenthesis "(" in front, then this pair first row ordinal number.

Example 1: Expression "(' |") (.*?) (/1) "in matching" ' Hello ', "World", the match result is: success; the match is: "' Hello '". Once again, the next match can be matched to "world".

Example 2: Expression "(/w)/1{4,}" when matching AA bbbb ABCDEFG ccccc 111121111 999999999 ", the match result is: success; the match to is" CCCCC ". When you match the next again, you get 999999999. This expression requires that the "/w" range of characters be repeated at least 5 times, noting the difference between "/w{5,}".

Example 3: Expression "< (/w+)/s* (/w+ (= (' |"). *?/4)/s*) *>.*?<//1> "Match" <td id= ' TD1 ' style= "Bgcolor:white" ></td> ", the matching result is successful. If <td> is not paired with </td>, the match fails, and if you change to another pairing, you can match the success.

2.3 Pre-search, mismatch; reverse pre-search, mismatch

A more flexible representation of the conditions attached to the "two" or "gap".

Forward pre-search: "(? =xxxxx)", "(?! XXXXX) "

Format: "(? =xxxxx)", in the matched string, it attaches to the "gap" or "two" conditions are: the right side of the gap, must be able to match the expression on the part of XXXXX . Because it is only the condition attached to this gap, it does not affect the following expression to really match the character after the gap. This is similar to "/b", which itself does not match any characters. The "/b" only takes the character before and after the gap to make a judgment, and does not affect the expression behind it to really match.

Example 1: Expression "Windows (? =nt| XP) "When matching Windows 98, Windows NT, Windows 2000" will only match "Windows" in "Windows NT", the other "windows" words are not matched.

Example 2: Expression "(/w) ((? =/1/1/1) (/1)) +" When matching string "AAA ffffff 999999999", you will be able to match 6 "F" of the first 4, can match 9 "9" of the first 7. This expression can be read as: Repeat more than 4 times the number of alphanumeric, then match the rest of the last 2 bits. Of course, this expression can not be written in this way, in order to be used as a demonstration.

Format: "(?! XXXXX) ", on the right side of the gap, you must not match the xxxxx part of the expression.

Example 3: Expression "(?! /bstop/b).) + "When matching Fdjka ljfdl stop Fjdsla FDJ", it will match from the beginning to the position before "stop", and if there is no "stop" in the string, the entire string is matched.

Example 4: Expression "do (?! /w) "When matching string" Done, do, dog "can only match" do ". In the example of this article, "Do" is used behind "(?! /w) "and use the"/b "effect is the same.

Reverse pre-search: "(? <=xxxxx)", "(? <!xxxxx)"

The concept of these two formats and forward pre-search is similar, the requirement for reverse pre-search is: "left side" of the gap, the two formats must be able to match and must not match the specified expression, rather than to judge the right. As with forward pre-search, they are an additional condition to the slot in which they do not match any characters.

Example 5: Expression "(? <=/d{4})/d+ (? =/d{4})" matches "1234567890123456" when it matches the median 8 digits except the first 4 digits and the last 4 digits. Because JSCRIPT.REGEXP does not support reverse pre-search, this article does not provide an example of a demonstration.

3. Other general rules

3.1 Expressions, you can use "/xxx" and "/uxxxx" to represent one character ("X" denotes a hexadecimal number)

Form

Character Range

/xxx

Characters that are numbered in the range 0 to 255, such as: spaces can be represented with "/x20"

/uxxxx

Any character can be represented by using "/U" plus its numbered 4-digit hexadecimal number, for example: "/Medium"

3.2 In the expression "/S", "/D", "/w", "/b" for special meaning, the corresponding uppercase letter indicates the opposite meaning.

An expression

Can match

/S

Matches all non-whitespace characters ("/s" matches individual whitespace characters)

/d

Match all non-numeric characters

/w

Matches all characters except letters, numbers, and underscores

/b

Match a non-word boundary, that is, the left and right sides are "/w" or both sides are not "/w" range of the character gap

3.3 There is a special meaning in an expression, you need to add "/" to match the character summary of the character itself

Matches the starting position of the input string. To match the "^" character itself, use "/^"

The The

character

Description

^

$

To match the end position of the input string. To match the "$" character itself, use "/$"

()

To mark the beginning and end of a sub-expression. To match the parentheses, use the "/(" and "/)"

[]

To customize an expression that matches ' multiple characters '. To match the brackets, use "/[" and "/"

{}

To decorate the symbol for the number of matches. To match curly braces, use "/{" and "/}"

.

Matches any character except for the line break (/N). To match the decimal point itself, use the "/."

?

Adornment matches 0 or 1 times. To match the "?" character itself, use the "/?"

+

Cosmetic matches are at least 1 times. To match the "+" character itself, use "/+"

*

Adornment matches 0 or any number of times. To match the "*" character itself, use "/*"

|

The or relationship between the left and right expressions. Match "|" itself, use "/|"

3.4 parentheses "()" in the sub-expression, if you want to match the result is not recorded for later use, you can use the "(?: XXXXX)" format

Example 1: Expression "(?:( /w)/1) + "Match" a BBCCDD EFG ", the result is" BBCCDD ". The matching result of the brackets "(?:)" Range is not recorded, so "(/w)" is referenced using "/1".

3.5 Introduction to commonly used expression property settings: Ignorecase,singleline,multiline,global

Expression properties

Description

Ignorecase

By default, the letters in an expression are case-sensitive. Configuration as Ignorecase makes matching case insensitive. There is an expression engine that extends the capitalization concept to the case of a UNICODE range.

Singleline

By default, the decimal point "." Matches characters other than the line break (/N). Configure to Singleline to make the decimal point match all characters, including line breaks.

Multiline

By default, the expression "^" and "$" only match the start ① and end ④ positions of the string. Such as:

①xxxxxxxxx②/n
③xxxxxxxxx④

Configured as Multiline can make "^" match ①, and can also match a newline character after the next line starts before the ③ position, so that "$" matches the ④ outside, and also matches the line break before the end of the ② position.

Global

It works primarily when an expression is used for substitution, and is configured to replace all matches with the Global representation.

The regular expression in JS:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.