ECMAScript supports regular expressions through the regexp type.
Using the following Perl-like syntax, you can create a regular expression.
var expression=/pattern/flags;
The pattern section can be any simple or complex regular expression that can contain character classes, qualifiers, groupings, forward lookups, and reverse references.
Each regular expression can have one or more flags (flags) that indicate the behavior of the regular expression.
The matching pattern for regular expressions supports the following 3 flags.
G: Represents the global mode, which means that the pattern will be applied to all strings, rather than stopping immediately when the first occurrence is found;
I: Indicates case insensitive (case-insensitive) mode, which ignores the case of patterns and strings when determining matches;
M: represents multi-line (multiline) mode, that is, when you reach the end of a line of text, you also continue to find out if there are items in the next row that match the pattern.
Therefore, a regular expression is a combination of a pattern with the above 3 flags. Different combinations produce different results.
// all instances of "at" that match the string var pattern1=/at/G; // Match First "bat" or "cat", case insensitive var pattern2=/[bc]at/i; // match all 3-character combinations ending with "at", case-insensitive var pattern3=/.at/gi;
Like regular expressions in other languages, all meta characters used in a pattern must be escaped.
The meta-characters of regular expressions include:
( [ { \ ^ $ | ) ? * + . ] }
These meta-characters have one or more regular in the expression, so if you want to match the characters contained in the string, you must escape them.
// Match First "bat" or "cat", case insensitive var pattern1=/[bc]at/i; // match First "[bc]at/i", case insensitive var pattern2=/\[bc\]at/i; // match all 3-character combinations ending with "at", case-insensitive var pattern3=/.at/gi; // match all ". At", case insensitive var pattern4=/\.at/gi;
In the example above, PATTERN1 matches the first "bat" or "cat" and is not case sensitive. To directly match "[Bc]at", it is necessary to escape two square brackets. For PATTERN3, the period indicates the "at" Any previous character that can form a match. If you want to match ". At", you must escape the hour itself.
These examples are all regular expressions defined in literal form. Another way to create a regular expression is to use the Regexg constructor, which receives two parameters: a string pattern to match, and an optional flag string.
Any expression that can be defined with a literal can be defined using a constructor, as in the following example:
// Match First "bat" or "cat", case insensitive var pattern1=/[bc]at/i; // is the same as PATTERN1, except that it is created using the constructor function var pattern2=New RegExp ("[Bc]at", I);
The PATTTERN1 and pattern2 here are two fully equivalent regular expressions.
Note that the two arguments passed to the RegExp constructor are strings (you cannot pass the regular expression literal to the RegExp constructor).
Because the schema parameter of the RegExp constructor is a string, in some cases the character is double escaped.
All metacharacters must be double escaped, as are the characters that have escaped, such as \ n (the character \ is usually escaped to \ \ In a string, and the regular expression string becomes \\\\).
The following table shows some patterns, the literal representation of these patterns on the left, and the string to use when defining the same pattern with the RegExp constructor on the right.
Literal mode |
An equivalent string |
/\[bc\]at/ |
"/\\[bc\\]at/" |
/\.at/ |
"/\\.at/" |
/name\/age/ |
"/name\\/age/" |
/\d.\d{1,2}/ |
"/\\d.\\d{1,2}/" |
/\w\\hello\\123/ |
"/\\w\\\\hello\\\\123/" |
Using regular expression literals is not the same as using regular expressions created with the RegExp constructor. In Ecmasscript 3, regular expression literals always share the same regexp instance, and each new RegExp instance created with the constructor is a new instance. See the example below
var re=null; i; for (var i=0;i<10;i++) { re=/cat/g; Re.test ("catastrophe"); } for (var i=0;i<10;i++) { Re=new RegExp ("Cat", "G"); Re.test ("catastrophe"); }
In the first loop, even if it is specified in the loop body, only one regexp instance is actually created for/cat/. Because instance properties are not reset, calling the test () method again in a loop fails. This is because the first call to test () finds "cat", but the second call starts with the character indexed to 3 (the end of the last match), so it cannot be found. Because the end of the string is tested, So the next call to test () starts again from the beginning.
The second loop uses the RegExp constructor to create a regular expression in each loop. Because each iteration creates a new RegExp instance, each call to test () returns TRUE.
Ecmascipt 5 explicitly stipulates that the use of regular expression literals requires you to directly call the RegExp constructor, creating a new RegExp instance each time.
Both ie+,ff4+ and chrome have made changes accordingly.
5.4.1 RegExp Instance Properties
Each instance of RegExp has the following properties, which enable you to obtain various information about the pattern.
Global: Boolean value that indicates whether the G flag is set.
IgnoreCase: Boolean value that indicates whether the I flag is set.
LastIndex: An integer that represents the character position at which to start searching for the next occurrence, starting from 0.
Multiline: Boolean value that indicates whether the M flag is set.
Source: A string representation of a regular expression that is returned in literal form rather than in the string pattern in the incoming constructor.
These properties allow you to learn information about the various aspects of a regular expression, but it is of little use because it is included in the schema declaration.
varpattern1=/\[bc\]at/i; Console.log (Pattern1.global);//falseConsole.log (pattern1.ignorecase);//trueConsole.log (Pattern1.multiline);//falseConsole.log (Pattern1.lastindex);//0Console.log (Pattern1.source);//\[bc\]at varPattern2=NewREGEXP ("\\[bc\\]at", "I"); Console.log (Pattern2.global);//falseConsole.log (pattern2.ignorecase);//trueConsole.log (Pattern2.multiline);//falseConsole.log (Pattern2.lastindex);//0Console.log (Pattern2.source);//\[bc\]at
We note that, although the first pattern used is literal, the second pattern uses the RegExp constructor, but their source property is the same. Visible, the Source property holds a canonical form of a string, that is, the string used in the literal form.
5.4.2 RegExp instance method
The primary method of the RegExp object is exec (), which is specifically designed for capturing groups.
EXEC () takes a parameter, which is the string to which the pattern is applied, and then returns an array that contains the first occurrence of the information, or returns NULL if there is no match.
The returned array, although an instance of array, contains two additional attributes: Index and input.
Where index represents the position of the match in the string, and input represents the string that applies the regular expression.
In the array, the first item is a string that matches the entire pattern, and the other item is a string that matches the capturing group in the pattern (if there is no capturing group in the pattern, the array contains only one item).
For the Exec () method, it returns only one match at a time, even if the global flag (g) is set in the pattern.
Calling exec () multiple times on the same string without setting the global flag will always return information for the first occurrence.
In the case where the global flag is set, each call to EXEC () will continue to find the new match in the string, as shown in the following example:
vartext= "Cat,bat,sat,fat"; varpattern1=/.at/; varmatches=pattern1.exec (text); Console.log (matches.index);//0Console.log (Matches[0]);//CatConsole.log (Pattern1.lastindex);//0matches=pattern1.exec (text); Console.log (matches.index);//0Console.log (Matches[0]);//CatConsole.log (Pattern1.lastindex);//0 varpattern2=/.at/G; varmatches=pattern2.exec (text); Console.log (matches.index);//0Console.log (Matches[0]);//CatConsole.log (Pattern2.lastindex);//3matches=pattern2.exec (text); Console.log (matches.index);//4Console.log (Matches[0]);//BatConsole.log (Pattern2.lastindex);//7
In this example, the first pattern pattern1 is not a global schema, so each call to EXEC () returns the first match ("Cat"). The second pattern pattern2 is the global schema, so each call to EXEC () returns the next occurrence in the string. Until the end of the string is searched.
Also, pay attention to the change of the Lastindex property of the pattern. In global match mode, the value of lastindex is incremented each time the exec () is called, while in non-global mode it remains unchanged.
Note: The JavaScript implementation of IE has a bias on the Lastindex property, and the Lastindex property changes every time, even in non-global mode.
The second method of the regular expression is test (), which takes a string argument. Returns true if the pattern matches the parameter, otherwise, false. This method is handy when you want to know only if the target string matches a pattern but does not need to know its textual content.
Therefore, the test () method is often used in an if statement, as shown in the following example:
var text= "000-00-0000"; var pattern=/\d{3}-\d{2}-\d{4}/; if (pattern.test (text)) { Console.log ("The pattern was matched.") ); }
In this example, a regular expression is used to test a sequence of numbers. If the input text matches the pattern, a message is displayed.
This usage often occurs when validating user input, because we just want to know if the input is valid, and why it doesn't matter if it's invalid.
Both the tolocalestring () and ToString () methods inherited by the Rexexp instance return the literal of the regular expression, regardless of how the regular expression is created.
For example:
var pattern=New RegExp ("\\[bc\\]at", "GI"); Console.log (Pattern.tostring ()); // /\[bc\]at/gi Console.log (Pattern.tolocalestring ()); // /\[bc\]at/gi
Even though the pattern in the previous example was created by calling the RegExp constructor, the toLocaleString () and ToString () methods will still display their string representations as if they were created in literal form.
Note: the valueof () method of the regular expression returns the regular expression itself.
5.4.3 RegExp Constructor Properties
The RegExp constructor contains properties that are considered static properties in other languages. These properties apply to all regular expressions in the scope and vary based on the last regular expression operation performed.
Another unique thing about these properties is that you can access them in two ways.
In other words, these attributes have a long property name and a short property name (opera is the exception, it does not support short attribute names)
The following table lists the properties of the RegExp constructor.
Long attribute Name |
Short attribute Name |
Description |
Input |
$_ |
The last string to match. Opera does not implement this property |
Lastmatch |
$& |
Last-time match. Opera does not implement this property |
Lastparen |
$+ |
The capture group that was last matched. Opera does not implement this property |
Leftcontext |
$` |
Text before lastmatch in the input string |
Multiline |
$* |
Boolean value that indicates whether all expressions use multiline mode. IE and opera do not implement this property |
Rightcontext |
$ |
Text after lastmatch in the input string |
Use these properties to extract more specific information from the operations performed by exec () or test ().
As in the following example:
varText= "This have been a short summer"; varpattern=/(.) hort/G; //Note: Opera does not support Input,lastmatch,lastparen and multiline properties //Internet Rxplorer does not support multiline properties if(pattern.test (text)) {console.log (regexp.input);//This have been a short summerConsole.log (Regexp.leftcontext);//This have been aConsole.log (Regexp.rightcontext);//SummerConsole.log (Regexp.lastmatch);//sConsole.log (Regexp.lastparen);//falseConsole.log (Regexp.multiline); }
The above code creates a pattern that matches any one character followed by Hort and places the first character in a capturing group.
The individual properties of the RegExp constructor return the following values:
The input property returns the original string;
The Leftcontext property returns the string before the word short, and the Rightcontext property returns the string after short
The Lastmatch property returns the most recent string that matches the entire regular expression, that is, short;
The Lastparen property returns the last matching capturing group, which is the s in the example.
As mentioned earlier, the long attribute names used in the example can be replaced with the corresponding short attribute names. Just because these short attribute names are mostly not valid ECMAScript identifiers, you must access them by using the square brackets syntax, as shown below.
varText= "This have been a short summer"; varpattern=/(.) hort/G; //Note: Opera does not support Input,lastmatch,lastparen and multiline properties //Internet Rxplorer does not support multiline properties if(pattern.test (text)) {console.log (regexp.$_);//This have been a short summerConsole.log (regexp["$ '"]);//This have been aConsole.log (regexp["$ '"]);//SummerConsole.log (regexp["$&"]);// ShortConsole.log (regexp["$+"]);//sConsole.log (regexp["$*"]);//false
In addition to the several properties described above, there are up to 9 constructor properties for storing capturing groups. The syntax for accessing these properties is regexp.$1,regexp.$2 ... Regexp.$9, respectively, for storing first, second ... Nineth matching capturing group. These properties are automatically populated when the exec () and test () methods are called.
Then you can use them as follows:
var test= "This have a short summer"; var pattern=/(..) or (.) /g; if (pattern.test (text)) { Console.log (regexp.$1);//sh Console.log (regexp.$2);// T }
This creates a pattern with two capturing groups and tests a string with that pattern. Even though the test () method returns only a Boolean value, the properties of the RegExp constructed function, $ and $, are also self-populated with strings that match the corresponding capturing group.
Limitations of the 5.4.4 model
Although the regular expression functionality in ECMAScript is relatively complete, the advanced regular expression features supported by some languages, especially Perl, are still missing.
The following lists the features that are not supported by the ECMAScript regular expression:
Match string start and end \a and \z anchor; (but support to insert symbol ^ and dollar sign $ to match start and end of string)
Find Backward (lookbehind);(but fully support forward lookup lookhead)
The set and intersection classes;
Atomic Group (atomic grouping);
Unicode support (except for a single character, such as \UFFFF);
Named capturing group; (but a numbered capturing group is supported)
S (single, Row) and X (free-spacing, no interval) matching mode;
condition matching;
Regular expression annotations;
Even with these limitations, ECMAScript regular expressions are still very powerful and can help us do most of the pattern-matching tasks.
Elevation 5.4 regexp Type