RegExp Direct volume and creation of objects
As with strings and numbers, it is obvious that the original type of direct quantity for each value in the program represents the same value. A new object is created each time the program runs with the object's direct amount (initialization expression) such as {} and []. For example, if you write var a = [] in the loop body, a new empty array is created for each traversal. The direct volume of regular expressions differs from this, and the ECMAScript 3 specification stipulates that a regular expression direct amount is converted to a RegExp object when it is executed, and that the same object is returned for each operation of the same code that represents the direct amount of the regular expression. The ECMAScript 5 specification does the opposite, and each operation of the regular expression direct amount represented by the same code returns the new object. I e has been implemented in accordance with the Ec-mascript 5 specification, and most of the latest versions of browsers are starting to follow Ec-mascript 5, although the current standard is not fully implemented.
1.1 Direct Measure characters
All letters and numbers in a regular expression are matched by literal meaning. JavaScript Regular expression syntax also supports non-alphanumeric character matching, which need to be escaped with a backslash (\) as a prefix. For example, escape characters \ n are used to match line breaks. These escape characters are listed in table 10-1.
Table 10-1: Direct-measure characters in regular expressions
character |
Match |
alphabetic and numeric characters |
Their own |
\o |
Nul character (\u0000) |
\ t |
tab characters (\u0009) |
\ n |
Line Feed (\u000a) |
\v |
Vertical tab (\U000B) |
\f |
Page Breaks (\u000c) |
\ r |
return character (\u000d) |
\xnn |
Latin characters specified by the hexadecimal number NN, for example, \x0a equivalent to \ n |
\uxxxx |
Unicode characters specified by the hexadecimal number xxxx, such as \u0009 equivalent to \ t |
\cx |
Control character ^x, for example, \CJ equivalent to a newline character \ n |
In regular expressions, many punctuation marks have special meanings, and they are: ^ $. * + ? = ! : | \ / ( ) [ ] { } |
The regular expression "/\/" is used to match any string that contains a backslash.
1.2 Character classes
For example,/[\u0400-\u04ff]/is used to match all Cyrillic characters.
Table 10-2: Character classes for regular expressions
character |
Match |
[...] |
Any character within the square brackets |
[^...] |
Any character not in square brackets |
. |
Any character except line breaks and other Unicode line terminators |
\w |
Words of any ASCII character, equivalent to [a-za-z0-9] |
\w |
Any word that is not an ASCII character, equivalent to [^a-za-z0-9] |
\s |
Any Unicode whitespace characters |
\s |
Any characters that are not Unicode whitespace, note that \w and \s are different |
\d |
Any ASCII number, equivalent to [0-9] |
\d |
Any character other than the ASCII number, equivalent to [^0-9] |
\b |
BACKSPACE Direct Quantity (special case) |
Note that these special escape characters can also be written within square brackets. For example, because \s matches all whitespace characters, \d matches all numbers, so/[\s\d]/matches any whitespace or number. Note that there is a special case here. Here we'll see the special meaning of the escape character \b, which represents backspace when used in a character class, so to represent a backspace in a regular expression in a direct amount, you only need to use a character class/[\b]/with one element.
1.3 Repeat
We follow a regular pattern followed by a tag that specifies that the character repeats. Because some types of repetition are very common, there are special characters that are specifically used to represent this situation. For example, "+" is used to match one or more copies of the previous pattern. Table 10-3 summarizes the regular syntax for repeating these expressions.
Table 10-3: Repeating character syntax for regular expressions
character |
meaning |
{N,m} |
Matches the previous item at least n times, but not more than m times |
{N,} |
Match the previous item at least n times |
N |
Match the previous top n times |
? |
Matches the previous item 0 or 1 times, which means the previous item is optional, equivalent to {0,1} |
+ |
Matches 1 or more times before, equivalent to {1,} |
* |
Matches 0 or more times before, equivalent to {0,} |
Here are some examples:
/\d{2,4}///Match 2~4 number
/\w{3}\d?///Exact match three words and an optional number
/\s+java\s+///Match string "Java" with one or more spaces before and after
/[^ (]///matching one or more non-opening parenthesis characters
In the use of "" and "?" Note that because these characters can match 0 characters, they allow nothing to match. For example, the regular expression/a*/actually matches the string "BBBB" because the string contains 0 aces.
Not greedy repetition.
The matching repeat characters listed in table 10-3 are matched as much as possible and allow subsequent regular expressions to continue to match. So what we call a "greedy" match. We can also use regular expressions for non greedy matches. Only need to match the character End-of-file followed by a question mark: "??", "+?", "*?" or "{1,5}?". For example, regular expression/a+/can match one or more consecutive letter A. When you use "AAA" as a matching string, the regular expression matches its three characters. But/a+?/can also match one or more consecutive letter A, but it is matched as little as possible. We also use "AAA" as a matching string, but the latter pattern can only match the first A.
The result of using a non greedy matching pattern may not be the same as expected. Consider the following regular expression/a+b/, which can match one or more A, and a B. When you use "Aaab" as a matching string, it matches the entire string. Now try it again. Non-greedy match version/a+?b/, which matches as few as possible A and a B. When you use it to match the "Aaab", you expect it to match A and the last B. In practice, however, the pattern matches the entire string, exactly the same as the greedy match for that pattern. This is because pattern matching of regular expressions always looks for the first possible match in the string. Because the match starts with the first character of the string, a shorter match in its substring is not considered here.
1.4 Select, group, and reference
The syntax for regular expressions also includes special characters that specify a selection, a subexpression grouping, and a reference to a previous subexpression. Character "|" Used to separate the characters for selection. For example,/ab|cd|ef/can match the string "AB", can also match the string "CD", and can also match the string "EF". /\d{3}| [A-z] {4}/matches a three-digit number or four lowercase letters.
Note that the selection is attempted to match the order from left to right until a match is found. If the selection on the left matches, the match on the right is ignored, even if it produces a better match. Therefore, when the regular expression/a|ab/matches the string "AB", it can only match the first character.
Table 10-4: Selection, grouping, and reference characters for regular expressions
character |
meaning |
| |
Select to match the subexpression to the left of the symbol or to the right side of the child expression |
(...) |
Group, combining several items into one unit, which can be passed "*", "+", ""? "and" | " , and you can remember the strings that match this combination to use for subsequent references. |
(?:...) |
Combine items into one unit, but do not memorize characters that match the group |
\ n |
Matches the first match of the Nth group, the group is a subexpression (and possibly nested) in parentheses, and the group index is left to right parenthesis, and the "(?:" Form of grouping is not encoded |
1.5 Specify a match location
As described earlier, multiple elements in a regular expression can match one character of a string. For example, the \s match is just a blank character. There are also some elements of regular expressions that match the position between characters, not the actual characters.
The most commonly used anchor element is ^, which matches the start of the string, and the anchor element $ is used to match the end of the string.
Table 10-5: Anchor characters in regular expressions
character |
meaning |
^ |
Matches the beginning of a string, in multiple-row retrieval, to match the beginning of a line |
$ |
Matches the end of a string, in multiple-line retrieval, matching the end of a line |
\b |
Match the bounds of a word, in short, the position between the character \w and \w, or the position between the character \w and the beginning or end of the string (but note that [\b] matches the backspace) |
\b |
Position that matches a non-word boundary |
(? =p) |
0 wide forward lookahead assertion, requiring that the next character be matched to p, but not the characters that match P |
(?! P |
0 wide Negative lookahead assertion, requiring that the next character not match p |
1.6 Modifiers
The syntax in the regular expression also has the last knowledge point, the modifier of the regular expression, to illustrate the rules of the advanced matching pattern. Unlike the regular expression syntax discussed earlier, modifiers are placed outside the "/" symbol, that is, they do not appear between two diagonal lines, but after the second slash. JavaScript supports three modifiers, and the modifier "I" is used to illustrate that pattern matching is case-insensitive. The modifier "G" indicates that pattern matching should be global, that is, to find all matches in the retrieved string. The modifier "M" is used to perform a match in multi-line mode, in which the ^ and $ anchor characters match the start and end of each line in addition to the beginning and end of the entire string, if the string to be retrieved contains more than one row. For example, regular expression/java$/im can match "Java" or match "Java\nis fun".
Table 10-6: Regular Expression modifiers
character |
meaning |
I |
Performing a case-insensitive match |
G |
Performs a global match, in short, finds all matches, rather than stopping after the first one is found |
M |
Multiple-line matching pattern, ^ matches the beginning of a line and the beginning of a string, $ matches the end of a line, and the end of a string |
2. String method for pattern matching
2.1search ()
Its argument is a regular expression that returns the starting position of the first substring to match, and returns 1 if no matching substring is found.
If the search () parameter is not a regular expression, it is first converted to a regular expression through the RegExp constructor, and the search () method does not support global retrieval because it ignores modifier g in the regular expression argument.
Copy Code code as follows:
"JavaScript". Search (/script/i); 4
2.2replace ()
The replace () method is used to perform retrieval and substitution operations. The first argument is a regular expression, and the second argument is the string to be replaced.
"JavaScript". Replace (/javascript/gi, "a") //"a"
//A reference text begins in quotation marks, ending in quotes
//middle content area cannot contain quotes
var quote =/" ([^ "]*)"/g;
Replace the English quotation mark with the Chinese half-width quotation mark, while keeping the contents between the quotes (stored in $) unmodified
text.replace (quote, ' "$");
2.3match ()
The match () method is the most common string regular expression method. Its unique argument is a regular expression (or a regexp () constructor that converts it to a regular expression) and returns an array of matching results.
"1 plus 2 equals 3". Match (/\d+/g)//Return ["1", "2", "3"]
for example, use the following code to resolve a URL:
var url =/(\w+): \/\/([\w.] +) \/(\s*)/;
var text = "Visit my blog at Http://www.example.com/~david";
var result = Text.match (URL);
if (result!= null) {
var fullurl = result[0];
Contains "Http://www.example.com/~david"
var protocol = result[1];
Contains "http"
var host = result[2];
Contains "www.example.com"
var path = result[3];
Contains "~david"
}
2.4split ()
This method is used to split the string that calls it into an array of substrings, using the delimiter of the split () parameter
Copy Code code as follows:
"123,456,789". Split (","); return ["123", "456", "789"]
The parameter of the split () method can also be a regular expression, which makes the split () method extremely powerful. For example, you can specify a separator, allowing any number of whitespace characters to be left on either side:
Copy Code code as follows:
"1, 2, 3, 4, 5". Split (/\s*,\s*/); return ["1", "2", "3", "4", "5"]
3.REGEXP objects
Regular expressions are represented by RegExp objects. In addition to the RegExp () constructor, the RegExp object also supports three methods and some properties.
The REGEXP () constructor has two string parameters, where the second argument is optional and REGEXP () is used to create a new RegExp object. The first parameter contains the body part of the regular expression, which is the text between the two slashes in the direct amount of the regular expression. It is important to note that both the string literal and regular expression use the "\" character as the prefix of the escape character, so when RegExp () is passed into a regular expression expressed by a string, the "\" must be replaced with "\". The second parameter of REGEXP () is optional, and if the second argument is supplied, it specifies the modifier of the regular expression. However, only modifiers G, I, M, or their combination can be passed in. Like what:
5 digits in the global match string, note that "\" is used here instead of "\" var zipcode = new RegExp ("\\d{5}", "G");
Properties of 3.1 RegExp
Each RegExp object contains 5 properties. The property source is a read-only string that contains the text of the regular expression. The property Global is a read-only Boolean value that indicates whether the regular expression has a modifier g. The property ignore-case is also a read-only Boolean value that indicates whether the regular expression has modifiers I. The property multiline is a read-only Boolean value that indicates whether the regular expression has a modifier m. The last attribute, lastindex, is a readable/writable integer. If the matching pattern has a G modifier, this property is stored at the beginning of the next retrieval of the entire string, which is used by the exec () and test () methods, as described below.
Property |
Description |
FF |
IE |
Global |
Whether the RegExp object has a flag g. |
1 |
4 |
IgnoreCase |
Whether the RegExp object has a flag I. |
1 |
4 |
Lastindex |
An integer that marks the position of the character to begin the next match. |
1 |
4 |
Multiline |
Whether the RegExp object has a flag m. |
1 |
4 |
Source |
The source text of the regular expression. |
1 |
4 |
The method of 3.2 regexp
The RegExp object defines two methods for performing pattern matching operations. Their behavior is similar to the string method described above. RegExp the most important way to perform pattern matching is exec (), which is similar to the string method match () described in section 10.2, except that the RegExp method's argument is a string and the string method's argument is a RegExp object.
Method |
Description |
FF |
IE |
Compile |
Compiles the regular expression. |
1 |
4 |
Exec |
Retrieves the value specified in the string. Returns the found value and determines its location. |
1 |
4 |
Test |
Retrieves the value specified in the string. Returns TRUE or FALSE. |
1 |
4 |
3.2.1exec ()
var pattern =/java/g;
var text = "JavaScript is more fun than java!";
var result;
while (result = pattern.exec (text))!= null) {
alert ("Matched '" +
Result[0] + "" + "at
position" + result . Index + ";
Next search begins at "+ Pattern.lastindex);
}
3.2.2test ()
Another RegExp method is test (), which is simpler than exec (). Its argument is a string that detects a string with test () and returns true if a matching result of the regular expression is included:
Copy Code code as follows:
var pattern =/java/i;pattern.test ("JavaScript"); Returns True
4. Regular expressions of common use
Phone number
/^ ([\+][0-9]{1,3} ([\.\-])? ([\ (][0-9]{1,6}[\)])? ([0-9 \.\-]{1,32}) (([A-za-z \:]{1,11})? [0-9] {1,4}?) $/
Mailbox
/^ (([a-z]|\d| [!#\$%& ' \*\+\-\/=\?\^_ ' {\|} ~]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) +(\. ([a-z]|\d| [!#\$%& ' \*\+\-\/=\?\^_ ' {\|} ~]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) +)*)| ((\x22) (((\x20|\x09) * (\x0d\x0a))? \x20|\x09) +)? ([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21| [\x23-\x5b]| [\x5d-\x7e]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (\ \ ([\x01-\x09\x0b\x0c\x0d-\x7f]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]))) * (((\x20|\x09) * (\x0d\x0a))? (\x20|\x09) +)? (\x22))) @ (([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.) + ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.? $/i
Date (YYYY-MM-DD)
/^\d{4}[\/\-] (0?[ 1-9]|1[012]) [\/\-] (0?[ 1-9]| [12] [0-9]|3[01]) $/
IPV4
/^ (([01]?) [0-9] {1,2}) | (2[0-4][0-9]) | (25[0-5]) [.]) {3} ([0-1]? [0-9] {1,2}) | (2[0-4][0-9]) | (25[0-5]) $/
Url
/^ (https?| FTP): \/\/(([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:) *@)? (((\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) | (([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.) + ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.?) (: \d*)? (\/(([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @) + (\/([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)*)*)?)? (\? (([a-z]|\d|-|\.| _|~| [\u00a0-\ud7ff\uf900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)| [\ue000-\uf8ff]|\/|\?] *)? (\# (([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)|\/|\?) *)? $/i
Number (allow +-123,123.123)
/^[\-\+]? (([0-9]{1,3}) ([,][0-9]{3}) *) | ([0-9]+)]? ([\.] ([0-9]+)]? $/
2-20 English or Chinese characters
/^ ([\u4e00-\u9fa5]{2,20}) $|^ ([a-za-z]{2,20}) $