Share daily collection JS regular expressions (JavaScript regular expression) _ Regular expressions

Source: Internet
Author: User
Tags character classes modifier modifiers regular expression repetition

RegExp Direct volume and creation of objects

As with strings and numbers, it is obvious that the original type of direct quantity for each value in the program represents the same value. A new object is created each time the program runs with the object's direct amount (initialization expression) such as {} and []. For example, if you write var a = [] in the loop body, a new empty array is created for each traversal. The direct volume of regular expressions differs from this, and the ECMAScript 3 specification stipulates that a regular expression direct amount is converted to a RegExp object when it is executed, and that the same object is returned for each operation of the same code that represents the direct amount of the regular expression. The ECMAScript 5 specification does the opposite, and each operation of the regular expression direct amount represented by the same code returns the new object. I e has been implemented in accordance with the Ec-mascript 5 specification, and most of the latest versions of browsers are starting to follow Ec-mascript 5, although the current standard is not fully implemented.

1.1 Direct Measure characters

All letters and numbers in a regular expression are matched by literal meaning. JavaScript Regular expression syntax also supports non-alphanumeric character matching, which need to be escaped with a backslash (\) as a prefix. For example, escape characters \ n are used to match line breaks. These escape characters are listed in table 10-1.

Table 10-1: Direct-measure characters in regular expressions

character Match
alphabetic and numeric characters Their own
\o Nul character (\u0000)
\ t tab characters (\u0009)
\ n Line Feed (\u000a)
\v Vertical tab (\U000B)
\f Page Breaks (\u000c)
\ r return character (\u000d)
\xnn Latin characters specified by the hexadecimal number NN, for example, \x0a equivalent to \ n
\uxxxx Unicode characters specified by the hexadecimal number xxxx, such as \u0009 equivalent to \ t
\cx Control character ^x, for example, \CJ equivalent to a newline character \ n
In regular expressions, many punctuation marks have special meanings, and they are: ^ $. * + ? = ! : | \ / ( ) [ ] { }

The regular expression "/\/" is used to match any string that contains a backslash.

1.2 Character classes

For example,/[\u0400-\u04ff]/is used to match all Cyrillic characters.

Table 10-2: Character classes for regular expressions

character Match
[...] Any character within the square brackets
[^...] Any character not in square brackets
. Any character except line breaks and other Unicode line terminators
\w Words of any ASCII character, equivalent to [a-za-z0-9]
\w Any word that is not an ASCII character, equivalent to [^a-za-z0-9]
\s Any Unicode whitespace characters
\s Any characters that are not Unicode whitespace, note that \w and \s are different
\d Any ASCII number, equivalent to [0-9]
\d Any character other than the ASCII number, equivalent to [^0-9]
\b BACKSPACE Direct Quantity (special case)

Note that these special escape characters can also be written within square brackets. For example, because \s matches all whitespace characters, \d matches all numbers, so/[\s\d]/matches any whitespace or number. Note that there is a special case here. Here we'll see the special meaning of the escape character \b, which represents backspace when used in a character class, so to represent a backspace in a regular expression in a direct amount, you only need to use a character class/[\b]/with one element.

1.3 Repeat

We follow a regular pattern followed by a tag that specifies that the character repeats. Because some types of repetition are very common, there are special characters that are specifically used to represent this situation. For example, "+" is used to match one or more copies of the previous pattern. Table 10-3 summarizes the regular syntax for repeating these expressions.

Table 10-3: Repeating character syntax for regular expressions

character meaning
{N,m} Matches the previous item at least n times, but not more than m times
{N,} Match the previous item at least n times
N Match the previous top n times
? Matches the previous item 0 or 1 times, which means the previous item is optional, equivalent to {0,1}
+ Matches 1 or more times before, equivalent to {1,}
* Matches 0 or more times before, equivalent to {0,}

Here are some examples:

/\d{2,4}///Match 2~4 number

/\w{3}\d?///Exact match three words and an optional number

/\s+java\s+///Match string "Java" with one or more spaces before and after

/[^ (]///matching one or more non-opening parenthesis characters

In the use of "" and "?" Note that because these characters can match 0 characters, they allow nothing to match. For example, the regular expression/a*/actually matches the string "BBBB" because the string contains 0 aces.

Not greedy repetition.

The matching repeat characters listed in table 10-3 are matched as much as possible and allow subsequent regular expressions to continue to match. So what we call a "greedy" match. We can also use regular expressions for non greedy matches. Only need to match the character End-of-file followed by a question mark: "??", "+?", "*?" or "{1,5}?". For example, regular expression/a+/can match one or more consecutive letter A. When you use "AAA" as a matching string, the regular expression matches its three characters. But/a+?/can also match one or more consecutive letter A, but it is matched as little as possible. We also use "AAA" as a matching string, but the latter pattern can only match the first A.

The result of using a non greedy matching pattern may not be the same as expected. Consider the following regular expression/a+b/, which can match one or more A, and a B. When you use "Aaab" as a matching string, it matches the entire string. Now try it again. Non-greedy match version/a+?b/, which matches as few as possible A and a B. When you use it to match the "Aaab", you expect it to match A and the last B. In practice, however, the pattern matches the entire string, exactly the same as the greedy match for that pattern. This is because pattern matching of regular expressions always looks for the first possible match in the string. Because the match starts with the first character of the string, a shorter match in its substring is not considered here.

1.4 Select, group, and reference

The syntax for regular expressions also includes special characters that specify a selection, a subexpression grouping, and a reference to a previous subexpression. Character "|" Used to separate the characters for selection. For example,/ab|cd|ef/can match the string "AB", can also match the string "CD", and can also match the string "EF". /\d{3}| [A-z] {4}/matches a three-digit number or four lowercase letters.

Note that the selection is attempted to match the order from left to right until a match is found. If the selection on the left matches, the match on the right is ignored, even if it produces a better match. Therefore, when the regular expression/a|ab/matches the string "AB", it can only match the first character.

Table 10-4: Selection, grouping, and reference characters for regular expressions

character meaning
| Select to match the subexpression to the left of the symbol or to the right side of the child expression
(...) Group, combining several items into one unit, which can be passed "*", "+", ""? "and" | " , and you can remember the strings that match this combination to use for subsequent references.
(?:...) Combine items into one unit, but do not memorize characters that match the group
\ n Matches the first match of the Nth group, the group is a subexpression (and possibly nested) in parentheses, and the group index is left to right parenthesis, and the "(?:" Form of grouping is not encoded

1.5 Specify a match location

As described earlier, multiple elements in a regular expression can match one character of a string. For example, the \s match is just a blank character. There are also some elements of regular expressions that match the position between characters, not the actual characters.
The most commonly used anchor element is ^, which matches the start of the string, and the anchor element $ is used to match the end of the string.

Table 10-5: Anchor characters in regular expressions

character meaning
^ Matches the beginning of a string, in multiple-row retrieval, to match the beginning of a line
$ Matches the end of a string, in multiple-line retrieval, matching the end of a line
\b Match the bounds of a word, in short, the position between the character \w and \w, or the position between the character \w and the beginning or end of the string (but note that [\b] matches the backspace)
\b Position that matches a non-word boundary
(? =p) 0 wide forward lookahead assertion, requiring that the next character be matched to p, but not the characters that match P
(?! P 0 wide Negative lookahead assertion, requiring that the next character not match p

1.6 Modifiers

The syntax in the regular expression also has the last knowledge point, the modifier of the regular expression, to illustrate the rules of the advanced matching pattern. Unlike the regular expression syntax discussed earlier, modifiers are placed outside the "/" symbol, that is, they do not appear between two diagonal lines, but after the second slash. JavaScript supports three modifiers, and the modifier "I" is used to illustrate that pattern matching is case-insensitive. The modifier "G" indicates that pattern matching should be global, that is, to find all matches in the retrieved string. The modifier "M" is used to perform a match in multi-line mode, in which the ^ and $ anchor characters match the start and end of each line in addition to the beginning and end of the entire string, if the string to be retrieved contains more than one row. For example, regular expression/java$/im can match "Java" or match "Java\nis fun".

Table 10-6: Regular Expression modifiers

character meaning
I Performing a case-insensitive match
G Performs a global match, in short, finds all matches, rather than stopping after the first one is found
M Multiple-line matching pattern, ^ matches the beginning of a line and the beginning of a string, $ matches the end of a line, and the end of a string

2. String method for pattern matching

2.1search ()

Its argument is a regular expression that returns the starting position of the first substring to match, and returns 1 if no matching substring is found.
If the search () parameter is not a regular expression, it is first converted to a regular expression through the RegExp constructor, and the search () method does not support global retrieval because it ignores modifier g in the regular expression argument.

Copy Code code as follows:

"JavaScript". Search (/script/i); 4

2.2replace ()

The replace () method is used to perform retrieval and substitution operations. The first argument is a regular expression, and the second argument is the string to be replaced.

"JavaScript". Replace (/javascript/gi, "a")  //"a"
//A reference text begins in quotation marks, ending in quotes
//middle content area cannot contain quotes
var quote =/" ([^ "]*)"/g;
Replace the English quotation mark with the Chinese half-width quotation mark, while keeping the contents between the quotes (stored in $) unmodified
text.replace (quote, ' "$");

2.3match ()

The match () method is the most common string regular expression method. Its unique argument is a regular expression (or a regexp () constructor that converts it to a regular expression) and returns an array of matching results.

"1 plus 2 equals 3". Match (/\d+/g)//Return ["1", "2", "3"]
for example, use the following code to resolve a URL:
var url =/(\w+): \/\/([\w.] +) \/(\s*)/;
var text = "Visit my blog at Http://www.example.com/~david";
var result = Text.match (URL);
if (result!= null) {  
 var fullurl = result[0]; 
 Contains "Http://www.example.com/~david"  
 var protocol = result[1]; 
 Contains "http"
 var host = result[2]; 
 Contains "www.example.com"  
 var path = result[3]; 
 Contains "~david"
}

2.4split ()

This method is used to split the string that calls it into an array of substrings, using the delimiter of the split () parameter

Copy Code code as follows:

"123,456,789". Split (","); return ["123", "456", "789"]

The parameter of the split () method can also be a regular expression, which makes the split () method extremely powerful. For example, you can specify a separator, allowing any number of whitespace characters to be left on either side:

Copy Code code as follows:

"1, 2, 3, 4, 5". Split (/\s*,\s*/); return ["1", "2", "3", "4", "5"]

3.REGEXP objects

Regular expressions are represented by RegExp objects. In addition to the RegExp () constructor, the RegExp object also supports three methods and some properties.
The REGEXP () constructor has two string parameters, where the second argument is optional and REGEXP () is used to create a new RegExp object. The first parameter contains the body part of the regular expression, which is the text between the two slashes in the direct amount of the regular expression. It is important to note that both the string literal and regular expression use the "\" character as the prefix of the escape character, so when RegExp () is passed into a regular expression expressed by a string, the "\" must be replaced with "\". The second parameter of REGEXP () is optional, and if the second argument is supplied, it specifies the modifier of the regular expression. However, only modifiers G, I, M, or their combination can be passed in. Like what:

5 digits in the global match string, note that "\" is used here instead of "\" var zipcode = new RegExp ("\\d{5}", "G");

Properties of 3.1 RegExp

Each RegExp object contains 5 properties. The property source is a read-only string that contains the text of the regular expression. The property Global is a read-only Boolean value that indicates whether the regular expression has a modifier g. The property ignore-case is also a read-only Boolean value that indicates whether the regular expression has modifiers I. The property multiline is a read-only Boolean value that indicates whether the regular expression has a modifier m. The last attribute, lastindex, is a readable/writable integer. If the matching pattern has a G modifier, this property is stored at the beginning of the next retrieval of the entire string, which is used by the exec () and test () methods, as described below.

Property Description FF IE
Global Whether the RegExp object has a flag g. 1 4
IgnoreCase Whether the RegExp object has a flag I. 1 4
Lastindex An integer that marks the position of the character to begin the next match. 1 4
Multiline Whether the RegExp object has a flag m. 1 4
Source The source text of the regular expression. 1 4

The method of 3.2 regexp

The RegExp object defines two methods for performing pattern matching operations. Their behavior is similar to the string method described above. RegExp the most important way to perform pattern matching is exec (), which is similar to the string method match () described in section 10.2, except that the RegExp method's argument is a string and the string method's argument is a RegExp object.

Method Description FF IE
Compile Compiles the regular expression. 1 4
Exec Retrieves the value specified in the string. Returns the found value and determines its location. 1 4
Test Retrieves the value specified in the string. Returns TRUE or FALSE. 1 4

3.2.1exec ()

var pattern =/java/g;
var text = "JavaScript is more fun than java!";
var result;
while (result = pattern.exec (text))!= null) {  
 alert ("Matched '" + 
 Result[0] + "" + "at    
 position" + result . Index +    "; 
 Next search begins at "+ Pattern.lastindex);
}

3.2.2test ()

Another RegExp method is test (), which is simpler than exec (). Its argument is a string that detects a string with test () and returns true if a matching result of the regular expression is included:

Copy Code code as follows:

var pattern =/java/i;pattern.test ("JavaScript"); Returns True

4. Regular expressions of common use

Phone number

/^ ([\+][0-9]{1,3} ([\.\-])? ([\ (][0-9]{1,6}[\)])? ([0-9 \.\-]{1,32}) (([A-za-z \:]{1,11})? [0-9] {1,4}?) $/

Mailbox

/^ (([a-z]|\d| [!#\$%& ' \*\+\-\/=\?\^_ ' {\|} ~]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) +(\. ([a-z]|\d| [!#\$%& ' \*\+\-\/=\?\^_ ' {\|} ~]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) +)*)| ((\x22) (((\x20|\x09) * (\x0d\x0a))? \x20|\x09) +)? ([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21| [\x23-\x5b]| [\x5d-\x7e]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (\ \ ([\x01-\x09\x0b\x0c\x0d-\x7f]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]))) * (((\x20|\x09) * (\x0d\x0a))? (\x20|\x09) +)? (\x22))) @ (([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.) + ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.? $/i

Date (YYYY-MM-DD)

/^\d{4}[\/\-] (0?[ 1-9]|1[012]) [\/\-] (0?[ 1-9]| [12] [0-9]|3[01]) $/

IPV4
/^ (([01]?) [0-9] {1,2}) | (2[0-4][0-9]) | (25[0-5]) [.]) {3} ([0-1]? [0-9] {1,2}) | (2[0-4][0-9]) | (25[0-5]) $/

Url
/^ (https?| FTP): \/\/(([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:) *@)? (((\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) \. (\d| [1-9]\d|1\d\d|2[0-4]\d|25[0-5]) | (([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]|\d| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.) + ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) ([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) * ([a-z]| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF])) \.?) (: \d*)? (\/(([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @) + (\/([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)*)*)?)? (\? (([a-z]|\d|-|\.| _|~| [\u00a0-\ud7ff\uf900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)| [\ue000-\uf8ff]|\/|\?] *)? (\# (([a-z]|\d|-|\.| _|~| [\U00A0-\UD7FF\UF900-\UFDCF\UFDF0-\UFFEF]) | (%[\da-f]{2}) | [!\$& ' \ (\) \*\+,;=]|:| @)|\/|\?) *)? $/i

Number (allow +-123,123.123)

/^[\-\+]? (([0-9]{1,3}) ([,][0-9]{3}) *) | ([0-9]+)]? ([\.] ([0-9]+)]? $/

2-20 English or Chinese characters
/^ ([\u4e00-\u9fa5]{2,20}) $|^ ([a-za-z]{2,20}) $

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.