DOM note (10): JavaScript Regular Expression and dom Regular Expression

Last Update:2014-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. RegExp

ECMAScript supports regular expressions through the RegExp type class. The syntax is similar to Perl:

Var exp =/pattern/flags;

The patternb part is any simple or complex regular expression. flags is one or more of the symbols carried by each regular expression.

Regular Expression Pattern Matching supports three tags:

G: Global mode, that is, the mode is applied to the entire string, instead of stopping immediately when the first match is found.

I: case-insensitive

M: multi-row mode. When the end of a line of text is reached, the system will continue to check whether there are items matching the mode in the next line.

1. Create a regular expression

JavaScript creates a regular expression in two ways: literal creation and RegExp object creation.

// Literally create var pattern1 =/[bc] at/I; var pattern2 =/\ [bc \] at/ig; // create var pattern3 = new RegExp ("[bc] at", "I") for the RegExp object "); // equivalent to pattern1 var pattern4 = new RegExp ("\ [bc \] at", "ig"); // equivalent to pattern2

There are two differences between the two: Conversion of special characters and instance creation.

The pattern parameter of the RegExp constructor is a string. Therefore, you need to escape the special characters as follows:

/\ W \ hello/

Literal Mode	Equivalent string
/\ [Bc \]/	"\ [Bc \]"
/\ D. \ d {1, 2 }/	"\ D. \ d {1, 2 }"
/\ W \ hello/	"\ W \ hello"

In ECMAScript 3, a RegExp instance is literally shared, and each instance of the constructor is different.

var re = null,      i;for(i=0;i<3;i++){     re = /cat/g;     re.test("catastrophe");}for(i=0;i<3;i++){     re = new RegExp("cat","g");     re.test("catastrophe");}

In earlier browsers, such as IE6, true is displayed only once for the first loop, and false is applied for others. True is displayed in the second loop.

In ECMAScript 5, the regular expression is used to create a new instance for each call, just like the RegExp constructor. Therefore, true is displayed in both loops in modern browsers.

2. RegExp instance attributes

Attribute	Description
Global	Boolean value, whether g flag is set
IgnoreCase	Boolean value, whether the I flag is set
Multiline	Boolean value, whether the m flag is set
LastIndex	Integer, the position of the character starting from the next match
Source	Regular Expression String Representation

3. Method

There are two common methods: exec () and test (), each of which accepts a string parameter.

If a match exists, exec () returns an array with two additional attributes: index and input. Index indicates the position of the matching item in the string, and input indicates the source string, that is, the exec () parameter. In the array, the first item is the string that matches the entire pattern, and the other items match the capture group (if there is no capture group, only one item is included ). No match exists. exec () returns null.

Test () indicates whether a match exists in the string. If yes, true is returned. If no match exists, false is returned.

var text = "mom and dad and bady";var pattern = /mom( and dad( and bady)?)?/gi;var matches = pattern.exec(text);alert(matches.length);    //3alert(matches.index);     //0alert(matches.input);     //mom and dad and badyalert(matches[0]);        //mom and dad and badyalert(matches[1]);        //and dad and badyalert(matches[2]);        //and bady

For exec (), if the global flag is not set, the first matching item is always returned when exec () is called multiple times on the same string. If the global flag is set, each time exec () is called, the string is searched from the last matched position.

Ii. String type

String is the object packaging type of a String, which is the same as Number and Boolean (DOM note (9): reference type, basic packaging type, and single built-in object ), you can also use new to create a string. Pattern matching is useful in String processing. The String type also defines multiple methods related to it.

Match (pattern): pattern is a literal regular expression or RegExp object, essentially the same as the exec () method.

var text = "cat,bat,sat,fat";var pattern = /.at/;var matches = text.match(pattern);//var matches = pattern.exec(text);alert(matches.index);alert(matches[0]);alert(pattern.lastIndex);

Search (pattern): the parameter is the same as match (). search starts from the beginning and returns the index of the first matching item. If no matching item exists,-1 is returned.

var text = "cat,bat,sat,fat";var pos = text.search(/at/);alert(pos);   //1

Replace (oldstring, newstring): replace oldstring with newstring. The first string can be a pattern object, and the second string can be used in combination with a capture group, or a function.

var text = "cat,bat,sat,fat";var pattern = /(.at)/g;var re = text.replace(pattern,"word($1)");alert(re);  //word(cat),word(bat),word(sat),word(fat)

If no capture group exists in the mode, use an empty string instead.

If the second parameter is a function, the function receives three parameters: The pattern match, the position of the pattern match in the string, and the original string.

Function htmlEscape (text) {return text. replace (/[<> "&]/g, function (match, pos, text) {switch (match) {case" <": return" & lt ;"; case ">": return "& gt;"; case "&": return "& amp;"; case "\" ": return" & quot; ";}}) ;}// return: & lt; p class = & quot; greeting & quot; & gt; helloWorld & lt;/p & gt; alert (htmlEscape ("<p class = \" greeting \ "> helloWorld </p> "));

If multiple capturing groups are defined in the regular expression, the matching items, the first capturing group, and the second capturing group are passed to the function in sequence .... The last two parameters remain unchanged.

Split (string [, limit]): separates strings and returns an array. A string can be a common string or a pattern matching object. Optional limit indicates the size of the returned array.

var colorText = "red,blue,yellow,black";  alert(colorText.split(","));   //[red,blue,yellow,black]alert(colorText.split(",",2));  //[red,blue]alert(colorText.split(/\W/));  //[red,blue,yellow,black]

Iii. Regular Expression rules

Character	Description
\	Mark the next character as a special character, a literal character, or a backward reference, or an octal escape character. For example, "n" matches the character "n ". "\ N" matches a line break. The sequence "\" matches "\", and "\ (" matches "(".
^	Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after "\ n" or "\ r.
$	Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before "\ n" or "\ r.
*	Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+	Match the previous subexpression once or multiple times. For example, "zo +" can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
?	Match the previous subexpression zero or once. For example, "do (es )?" It can match "do" in "do" or "does ".? It is equivalent to {0, 1 }.
{N}	N is a non-negative integer. Match n times. For example, "o {2}" cannot match "o" in "Bob", but can match two o in "food.
{N ,}	N is a non-negative integer. Match at least n times. For example, "o {2,}" cannot match "o" in "Bob", but can match all o in "foooood. "O {1,}" is equivalent to "o + ". "O {0,}" is equivalent to "o *".
{N, m}	Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. "O {0, 1}" is equivalent to "o ?". Note that there must be no space between a comma and two numbers.
?	When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", "o + ?" A single "o" will be matched, while "o +" will match all "o ".
.	Match any single character except "\ n. To match any character including "\ n", use a pattern like "[. \ n.
(Pattern)	Match pattern and obtain this match. The obtained match can be obtained from the generated Matches set. The SubMatches set is used in VBScript, and $0… is used in JScript... $9 attribute. To match the parentheses, use "\ (" or "\)".
(? : Pattern)	Matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "(\|)" character to combine all parts of a pattern. For example, "industr (? : Y \| ies) "is a simpler expression than" industry \| industrial.
(? = Pattern)	Forward pre-query: matches the search string at the beginning of any string that matches the pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example (? = 95 \| 98 \| NT \| 2000) "can match" Windows "in" Windows2000 ", but cannot match" Windows "in" Windows3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern)	Negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, "Windows (?! 95 \| 98 \| NT \| 2000) "can match" Windows "in" Windows3.1 ", but cannot match" Windows "in" Windows2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X \| y	Match x or y. For example, "z \| food" can match "z" or "food ". "(Z \| f) ood" matches "zood" or "food ".
[Xyz]	Character Set combination. Match any character in it. For example, "[abc]" can match "a" in "plain ".
[^ Xyz]	Negative value character set combination. Match any character not included. For example, "[^ abc]" can match "p" in "plain ".
[A-z]	Character range. Matches any character in the specified range. For example, "[a-z]" can match any lowercase letter in the range of "a" to "z.
[^ A-z]	Negative character range. Matches any character that is not within the specified range. For example, "[^ a-z]" can match any character that is not in the range of "a" to "z.
\ B	Match A Word boundary, that is, the position between a word and a space. For example, "er \ B" can match "er" in "never", but cannot match "er" in "verb ".
\ B	Match non-word boundary. "Er \ B" can match "er" in "verb", but cannot match "er" in "never ".
\ D	Match a numeric character. It is equivalent to [0-9].
\ D	Match a non-numeric character. It is equivalent to [^ 0-9].
\ S	Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S	Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ W	Match any word characters that contain underscores. Equivalent to "[A-Za-z0-9 _]"
\ W	Match any non-word characters. It is equivalent to "[^ A-Za-z0-9 _]".
\ F, \ n, \ r, \ t, \ v	Match a newline, linefeed, carriage return, horizontal tab, and vertical tab, it is equivalent to \ x0c and \ cL, \ x0a and \ cJ, \ x0d and \ cM, \ x09 and \ cI, \ x0b and \ cK respectively.
\ Cx	Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as a literal "c" character.
\ Xn	Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04 & 1 ". The regular expression can be ASCII encoded.
\ Num	Matches num, where num is a positive integer. References to the obtained matching. For example, "(.) \ 1" matches two consecutive identical characters.
\ N	Identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm	Identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm.
\ Nml	If n is an octal number (0-3) and m and l are Octal numbers (0-7), the octal escape value nml is matched.
\ Un	Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (?).

Iv. Common Regular Expressions: Induction of common Regular Expressions

First: http://www.ido321.com/1355.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

DOM note (10): JavaScript Regular Expression and dom Regular Expression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

DOM note (10): JavaScript Regular Expression and dom Regular Expression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support