ECMAScript supports regular expressions through the regexp type.
Using Perl-like syntax, you can create a regular expression: var expression=/pattern/flags;
Where the pattern section can be any simple or complex regular expression that can contain character classes, qualifiers, groupings, forward lookups, and reverse references.
Each regular expression can have one or more flags (flags) to label the behavior of the regular expression.
The matching pattern of a regular expression is just the following three flags:
G: Represents the global mode, which means that the pattern will be applied to all strings, rather than stopping immediately when the first occurrence is found
I: Indicates case insensitive (case-insensitive) mode, which ignores pattern and string capitalization when determining matches
M: represents multiple lines (multiline) mode, that is, when you reach the end of a line of text, you also continue to find out if there are any items in the next row that match the pattern
A regular expression is a combination of a pattern with the above three flags, with different combinations producing different results:
var pattern1=/at/g;//matches all instances of "at" in a string
var pattern2=/[bc]at/i;//matches the first "bat" or "cat", not case-sensitive
var pattern3=/.at/gi;//matches all combinations of three characters ending with "at", Case insensitive
Like regular expressions in other languages, all metacharacters used in the pattern must be escaped, and the metacharacters in the regular expression include: () {}[]\^?$|*+.
var pattern1=/[bc]at/i;//matches the first "bat" or "cat", not case-sensitive
var pattern2=/\[bc\]at/i;//directly matches the first "[Bc]at", not case-sensitive
Another way to create a regular expression is to use the RegExp constructor, which receives two parameters: one for the string pattern to match, and the other as an optional flag string.
var pattern1=/[bc]at/i;
var pattern2=new RegExp ("[Bc]at", "I");
The PATTERN1 and pattern2 here are two fully equivalent regular expressions.
Note that the two arguments passed to the RegExp constructor are strings (you cannot pass the regular expression literal to the RegExp constructor).
Because the schema parameter of the RegExp constructor is a string, in some cases the character is double-escaped (all metacharacters must be double-escaped, as are the characters that have escaped).
/\[bc\]at/== "\\[bc\\]at"
/\.at/== "\\.at"
/name\/age/== "Name\\/age"
/\d.\d{1,2}/== "\\d.\\d{1,2}"
/\w\\hello\\123/== "\\w\\\\hello\\\\123"
Using regular expression literals is not the same as using regular expressions created with the RegExp constructor.
In ECMASCRIPT3, regular expression literals always share the same regexp instance, and each new RegExp instance created with the constructor is a new instance.
var re=null,i;
for (i=0;i<10;i++) {re=/cat/g;re.test ("catastrophe");}
for (i=0;i<10;i++) {re=new RegExp ("Cat", "G"); Re.test ("Catastrophe");}
In the first loop, even if it is specified in the loop body, but actually creates only one regexp instance for/cat/, it will fail to call the test () method again in the loop because the instance property is not reset.
In the second loop, use the RegExp constructor to create a regular expression in each loop, because each iteration creates a new RegExp instance, so each call to the test () method returns True.
ECMAScript5 explicitly stipulates that the use of regular expression literals must be created as a direct call to the RegExp constructor, creating a new RegExp instance each time, where ie9+, firefox4+, and chrome are modified accordingly.
Each instance of RegExp has the following properties, which enable you to obtain various information about the pattern:
Global: Boolean value that indicates whether the G flag is set
IgnoreCase: Boolean value indicating whether the I flag is set
Multiline: Boolean value that indicates whether the M flag is set
LastIndex: Integer that represents the character position at which to start searching for the next occurrence, starting from 0
Source: A string representation of a regular expression that is returned in literal form rather than in a string pattern passed into the constructor
These properties allow you to obtain information about the various aspects of a regular expression, but it is of little practical use because it is all contained in a schema declaration.
var pattern1=/\[bc\]at/i;
alert (pattern1.global);//false
alert (pattern1.ignorecase);//true
alert (pattern1.multiline);//false
alert (pattern1.lastindex);//0
alert (pattern1.source);//"/\[bc\]at"
var pattern2=new RegExp ("/\\[bc\\]at", "I");
alert (pattern2.global);//false
alert (pattern2.ignorecase);//true
alert (pattern2.multiline);//false
alert (pattern2.lastindex);//0
alert (pattern2.source);//"/\[bc\]at"
The Source property holds a canonical form of a string, which is the string used by the literal, so that the source property values of the two declarations are the same, even if the declaration pattern is different.
The primary method of the RegExp object is exec (), which is specifically designed for capturing groups.
EXEC () takes a parameter, which is the string to which the pattern is applied, and then returns an array that contains the first match information, or returns NULL if there is no match.
The returned array, although an instance of array, contains two additional attributes: Index and input.
Where index represents the position of the match in the string, and input represents the string to which the regular expression is applied.
In the array, the first item is a string that matches the entire pattern, and the other is a string that matches the capturing group in the pattern.
var text= "Mom and Dad and baby";
var pattern=/mom (and Dad (and baby)?)? /gi;
The instance contains two capturing groups, the most internal capturing group matches "and baby", and the capturing group that contains it matches "and dad" or "and Dad and Baby"
var matches=pattern.exec (text);
alert (matches.index);//0, since the string itself matches the pattern, the index returned is 0
alert (matches.input);//"Mom and Dad and baby"
Alert (matches[0]);//"Mom and Dad and Baby", the first item is the entire string that matches
Alert (matches[1]);//"and Dad and Baby", the second item is the first capturing group matching content
Alert (matches[2]);//"and Baby", the third item is what the second capturing group matches
For the Exec () method, it returns only one match at a time, even if the global flag (g) is set in the pattern.
Calling exec () multiple times on the same string without setting the global flag will always return information for the first occurrence.
With the global flag set, each call to EXEC () will continue to look for new matches in the string.
var text= "Cat,bat,sat,fat";
var pattern1=/.at/;
var matches=pattern1.exec (text);
alert (matches.index);//0
Alert (matches[0]);//cat
alert (matches.lastindex);//0, non-global mode the next lookup still starts at 0
Matches=pattern1.exec (text);
alert (matches.index);//0
Alert (matches[0]);//cat, non-global mode each call returns the first occurrence
alert (matches.lastindex);//0,ie has an error implementing this property, and its value changes every time in non-global mode
var pattern2=/.at/g;
var matches=pattern2.exec (text);
alert (matches.index);//0
Alert (matches[0]);//cat
alert (matches.lastindex);//3, global mode The next lookup starts after the ordinal of the returned result
Matches=pattern2.exec (text);
alert (matches.index);//5
Alert (matches[0]);//bat
alert (matches.lastindex);//8, the next lookup continues from the ordinal of the returned result
The second method of a regular expression is test (), which takes a string argument.
Returns true if the pattern matches the parameter, otherwise false.
var text= "000-00-0000";
var pattern=/\d{3}-\d{2}-\d{4}/;
if (pattern.test (text)) {alert ("the pattern was matched.");
This approach often occurs when validating user input, because we only want to know if the input is valid.
Both the tolocalestring () and ToString () methods inherited by the RegExp instance return the literal of the regular expression, regardless of how the regular expression is created.
var pattern=new RegExp ("\\[bc\\]at", "GI");
Alert (pattern.tostring ());///\[bc\]at/gi
Alert (pattern.tolocalestring ());///\[bc\]at/gi
The valueof () method of the regular expression returns the regular expression itself.
The RegExp constructor contains properties that apply to all regular expressions in the domain and vary based on the last regular expression operation that was performed.
Another unique aspect of these properties is that they can be accessed by long property names and short property names, but opera does not support short attribute names.
Input ($_), the last string to match, opera does not implement this property
Lastmatch ($&), the last occurrence, opera did not implement this property
Leftcontext ($ '), text before lastmatch in the input string
Rightcontext ($ '), text after lastmatch in the input string
Lastparen ($+), the most recent matching capture group, opera did not implement this property
Multiline ($*), a Boolean value that indicates whether all expressions use multiline mode, IE and opera do not implement this property
Use these properties to extract more specific information from the operations performed by exec () and test ().
var text= "This have been a short summer";
var pattern=/(.) hort/g;
Creates a pattern that matches any one character followed by Hort and places the first character in a capturing group
if (pattern.test (text)) {
alert (regexp.input);//this have been a short summer
alert (regexp.leftcontext);//this has been a
alert (regexp.rightcontext);//summer
alert (regexp.lastmatch);//short
alert (regexp.lastparen);//s
alert (regexp.multiline);//false}
These long attribute names can be replaced with the corresponding short attribute names, but some short attribute names are not valid ECMAScript identifiers, so they must be accessed by square brackets.
For example: regexp.$_; regexp["$ '"]; regexp["$ '"]; regexp["$&"]; regexp["$+"]; regexp["$*"];
In addition to the above properties, there are 9 constructor properties that are used to store the capturing group.
The syntax for accessing these properties is Regexp.$1, regexp.$2 ... Regexp.$9, respectively, for storing first, second 、...... The Nineth matching capturing group.
These properties are automatically populated when the exec () or test () method is called.
var text= "This have been a short summer";
var pattern=/(..) or (.) /g;
if (Pattern.text (text)) {
alert (regexp.$1);//sh
alert (regexp.$2);//t}
This creates a pattern with two capturing groups and tests a string with that pattern.
Even though the test () method returns only a Boolean value, the $ and $ property pages of the RegExp constructor are populated with strings that match the corresponding capturing group.
Although the regular expression functionality in ECMAScript is relatively complete, the advanced regular expression attributes supported by some languages are still missing.
Listed below are the attributes that are not supported by the ECMAScript regular expression: \a and \z anchors that match the beginning and end of a string, lookup backward, set and Intersect classes, atomic groups, Unicode support, named capturing groups, S (single, Row), and X (Free-spacing, No interval) matching pattern, conditional matching, regular expression annotation.