- The direct character count of the regular expression.
Character matching
Alphanumeric characters themselves
\ O NUL character (\ u0000)
\ T-bit table (\ u0009)
\ N linefeed (\ u000A)
\ V vertical bit table (\ u000B)
\ F page feed (\ u000C)
\ R press enter (\ u000D)
\ Xnn is a Latin character specified by the hexadecimal number nn. For example, \ u0A is equivalent to \ n.
\ Uxxxx is a Unicode Character specified by the hexadecimal number xxxx. For example, \ u0009 is equivalent to \ t.
\ CX control character X, for example, \ cJ is equivalent to \ n
- Character classes of regular expressions
Character matching
[...] Any character in parentheses
[^...] Any character not in parentheses
Any character except the line break and other Unicode line stop characters
\ W any ASCII single character, equivalent to [a-zA-Z0-9 _]
\ W any ASCII non-single character, equivalent to [^ a-zA-Z0-9 _]
\ S any Unicode blank character [\ f \ n \ r \ t \ v]
\ S any Unicode non-blank characters. Note the differences between \ w and \ S [^ \ f \ n \ r \ t \ v]
\ D any ASCII number, equivalent to [0-9]
\ D any character except ASCII numbers, equivalent to [^ 0-9]
[\ B] Return direct quantity (Special Case)
- Repeated characters in a regular expression
Character meaning
{N, m} matches the first item at least n times, but cannot exceed m times
{N,} matches the previous item n times or more times
{N} matches the first item EXACTLY n times.
? Match the previous item 0 or 1 times, that is, this item is optional. Equivalent to {0, 1}
+ Match the previous item once or multiple times. Equivalent to {1 ,}
* Match the previous item 0 or multiple times. Equivalent to {0 ,}
In regular expressions, parentheses have several functions. One function is to combine a separate project group into a sub-expression, so that you can process
An independent unit is used as |, *, +, or? And so on. Another function of parentheses is to define the child mode in the complete mode. When I
When the regular expression matches the target string, you can extract the part that matches the child pattern in the brackets from the target string.
- Selection, grouping, and reference characters of Regular Expressions
Character meaning
| Select. Match the child expression on the left of the symbol or the child expression on the right.
(...) Combination. Combine several projects into one unit, which can be composed of |, *, +, or? And other symbols,
You can also remember the characters matching the combination for reference later.
(? :...) Only combination. Combines a project into a unit, but does not remember the matching characters in the group.
\ N matches the characters that match the nth group for the first time. The Group is a subexpression in parentheses (may be nested ).
). The group number is the number of left parentheses counted from left to right (? : The group in the form group is not encoded.
- The anchor element of the regular expression.
Character meaning
^ Match the start of a string. In multi-row search, match the beginning of a line
$ Match the end of a string. In multi-row search, match the end of a row.
\ B matches the boundary of a word. In short, it is located between the characters \ w and \ W, or between the characters \ w
And the start and end of the string (but note: [\ B] matches the return character)
\ B matches non-word boundary characters
(? = P) Forward and Forward Declaration, requiring that the subsequent characters match the pattern p, but not those matching characters
(?! P) reverse declaration, requiring that the subsequent characters do not match the pattern P
- Regular Expression flag
Character meaning
I perform case-insensitive matching
G executes a global match. In short, it finds all the matches instead of the first one.
M multi-row mode. ^ matches the beginning of a row and the start of a string. $ matches the end of a row and the end of a string.
RegExp object in JS
- Constructor:
Explicit constructor. Syntax: new RegExp ("pattern" [, "flags"]).
Implicit constructor. Syntax:/pattern/[flags].
- Static attributes
Index |
It is the starting position of the First Matching content in the current expression mode and starts from 0. The initial value is-1. The index attribute changes every successful match. |
Input |
Returns the current string, which can be abbreviated as $ _. The initial value is an empty string "". |
LastIndex |
It is the next position of the last character in the first match of the current expression mode, which starts from 0 and is often used as the starting position for continued search. The initial value is-1, it indicates that the search starts from the starting position. The value of the lastIndex attribute changes each time the matching is successful. |
LastMatch |
Is the last matching string of the current expression pattern, which can be abbreviated as $ &. Its initial value is an empty string "". The attribute value of lastMatch changes after each successful match. |
LastParen |
If the enclosed child match exists in expression mode, it is the substring matched by the final child match in the current expression mode, which can be abbreviated as $ +. Its initial value is an empty string "". The value of the lastParen attribute changes each time a successful match is made. |
LeftContext |
Is all content on the left of the last matching string in the current expression mode. It can be abbreviated as $ '("'" indicates the reverse quotation mark under "Esc" on the keyboard ). The initial value is an empty string "". The attribute value of each successful match changes accordingly. |
$1... $9 |
These attributes are read-only. If the expression mode contains the enclosed child match, $1... The $9 attribute values are the content captured by 1st to 9th child matches. If more than 9 Child matches exist, $1... The $9 attribute corresponds to the last nine Child matches. In an expression mode, you can specify any number of child matches with parentheses, but the RegExp object can only store the last nine Child matches. In the result array returned by some methods of the RegExp instance object, you can obtain the child matching results in All parentheses. |
- Instance attributes
Global |
Returns the status of the global sign (g) specified when the RegExp object instance is created. If the g flag is set when the RegExp object instance is created, True is returned for this attribute. Otherwise, False is returned. The default value is False. |
IgnoreCase |
Returns the status of the ignoreCase flag (I) specified when the RegExp object instance is created. If the I flag is set when the RegExp object instance is created, True is returned for this attribute. Otherwise, False is returned. The default value is False. |
MultiLine |
Returns the status of the multiLine flag (m) specified when the RegExp object instance is created. If the m flag is set when the RegExp object instance is created, True is returned for this attribute. Otherwise, False is returned. The default value is False. |
Source |
Returns the expression text string specified when the RegExp object instance is created. |
- Instance method
Exec |
The syntax format is exec (str ). This method searches a string using the expression mode specified when the RegExp object instance is created, and returns an array containing the search results. If a global sign (g) is set for the regular expression, you can perform a continuous search in the string by calling the exec and test Methods multiple times, each time, the string is searched from the location specified by the value of the lastIndex attribute of the RegExp object. If the global flag is not set, the exec and test Methods ignore the lastIndex attribute value of the RegExp object and start searching from the start position of the string. If the exec method does not find a match, the return value is null. If a match is found, an array is returned and the static attributes in the RegExp object are updated to reflect the match. Element 0 in the returned array contains the complete matching result, while element 1 ~ N is the result of each sub-match defined in expression mode in sequence.
|
Test |
The syntax format is test (str ). This method checks whether a string contains the expression mode specified when the RegExp object instance is created. If yes, True is returned; otherwise, False is returned. If a match is found, the static attributes in the RegExp object are updated to reflect the match. |
Compile |
The syntax format is compile ("pattern" [, "flags"]). This method can replace the expression mode used by the RegExp object instance and compile the new expression mode into an internal format, so that later matching can be executed faster. |
- RegExp description
By default, regular expressions use the longest (also called greedy) matching principle. When? Followed by other delimiters (*, + ,? , {N}, {n ,}, {n, m}), the matching mode becomes the use of the shortest (also called non-Greedy) matching principle.
A group composite is a combination of a part of the content in a regular expression. A reverse quote is used to match the content identifier captured by the previous Group combination.
(1) (pattern) combines the pattern part in parentheses into a composite item and sub-match that can be operated in a unified manner.
The child matching items are stored in the buffer in the order they appear from left to right in the regular expression mode. the buffer starts from 1 and can store up to 99 sub-matching captured content. the content captured by the sub-match stored in the buffer can be retrieved in the programming language or reverse referenced in the regular expression. to match the literal brackets "(" and ")", use "\ (" and "\)" in the regular expression "\)".
(2) \ num matches the content stored in the buffer with the serial number num. Here, num is a one or two decimal positive integers that identify a specific buffer, this method is called the reverse reference of the sub-match. one of the most useful applications of reverse reference is to provide the ability to represent the same matching item. For example, to match five consecutive numeric characters, you can use \ d {5} as the regular expression text, it can match 12345. However, to match five consecutive identical numeric characters, such as 55555 and 11111, you must use (\ d) \ 1 {4} as the regular expression text, \ 1 indicates that it is the same as the content captured in the previous (\ d). \ 1 {4} indicates that the content captured in the previous (\ d) appears four times in a row. another example Is to match "is the cost of gasoline going up? ", You can use/\ B ([a-z] +) \ 1 \ B/gi as the regular expression text.
(3 )(? : Pattern) combines the pattern part in parentheses into a composite item that can be operated in a unified manner, but does not capture this part as a sub-match, that is, the pattern part is a non-capturing match, the matched content is not stored in the buffer for future use. this is useful for scenarios where a combination is required, but the combination is not intended to have sub-matching characteristics.
(4 )(? = Pattern) is called forward "prediction first" Matching. In the matched position of the string to be searched, part of the matching content must contain pattern, but the Matching content is not processed as the matching result, it will not be stored in the capture buffer for future use. (? = Pattern) must be located at the beginning or end of a regular expression pattern.
(5 )(?! Pattern) is called reverse "prediction first" Matching. The matching content of pattern cannot be found in the corresponding position of the string to be searched. In addition, the matching function is the same as that of Forward "prediction first.
Instance demo
* Simple example
[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]
# Notes
(? <= Exp) Assertion after 0-width positive review (not supported)
(? <! Exp) Assertion after review with zero-width negative (not supported)
# References
Regular Expression 30-minute getting started tutorial http://www.jb51.net/tools/zhengze.html