Detailed explanation of Regular Expressions
Introduction
In short, regular expressions are a powerful tool for pattern matching and replacement. Its functions are as follows:
Test a mode of a string. For example, you can test an input string to see if there is a phone number or a credit card number. This is called Data Validity verification.
Replace text. You can use a regular expression in a document to identify a specific text, and then delete it all or replace it with another text.
Extract a substring from the string based on the pattern match. It can be used to search for specific text in text or input fields.
Basic syntax
After a preliminary understanding of the functions and functions of a regular expression, let's take a look at the syntax format of the regular expression.
The regular expression format is generally as follows:
/Love/the part between the "/" delimiters is the pattern to be matched in the target object. You only need to place the pattern content of the desired matching object between the "/" delimiters. To enable users to customize the mode content more flexibly, regular expressions provide special "metacharacters ". Metacharacters are special characters that have special meanings in regular expressions. They can be used to specify the mode in which the leading character (that is, the character located before the metacharacters) appears in the target object.
Frequently Used metacharacters include "+", "*", and "?".
The "+" metacharacter specifies that its leading character must appear one or more times consecutively in the target object.
The "*" metacharacter specifies that the leading character must appear zero or multiple times in the target object.
"?" Metacharacter specifies that the leading object must appear zero or once consecutively in the target object.
Next, let's take a look at the specific application of the regular expression metacharacters.
/FO +/because the regular expression above contains the "+" metacharacter, it can be used with the "fool", "FO ", or "football", and so on, one or more character strings that match the letter "F" consecutively.
/EG */because the above regular expression contains "*" metacharacters, it can be used with "easy", "ego ", or, "egg" and other strings that appear after the letter E are matched with zero or multiple Letter g consecutively.
/Wil? /Because the above regular expression contains "?" Metacharacter, indicating that it can match the "win" or "Wilson" in the target object, and matches zero or one character string after the letter I.
Sometimes I don't know how many characters to match. To adapt to this uncertainty, regular expressions support the concept of delimiters. These qualifiers can specify how many times a given component of a regular expression must appear to match.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least N times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least N times and at most m times. For example, "O {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
In addition to metacharacters, you can also precisely specify the frequency of occurrence of a pattern in a matching object. For example, the/Jim {}/regular expression specifies that the character m can appear 2-6 times in a row in the matching object. Therefore, the regular expression can match strings such as Jimmy or jimmmmmy.
After a preliminary understanding of how to use regular expressions, let's take a look at the usage of several other important metacharacters.
Code
- \ S: Used to match a single space character, including the tab key and line break;
- \ S: Used to match all characters except a single space character;
- \ D: Used to match numbers from 0 to 9;
- \ W: Used to match letters, numbers, or underscores;
- \ W: Used to match all characters that do not match \ W;
- .: Used to match all characters except line breaks.
(Note: \ s and \ W can be regarded as inverse operations)
Next, let's take a look at how to use the above metacharacters in regular expressions through examples.
/\ S +/the above regular expression can be used to match one or more space characters in the target object.
/\ D000/if we have a complex financial statement in hand, we can easily find all the total amount of RMB through the above regular expression.
In addition to the metacharacters described above, regular expressions also have a unique special character, that is, the positioning character. Specifies the position where the matching mode appears in the target object. Commonly used positioning characters include "^", "$", "\ B", and "\ B ".
Code
- The "^" Locator specifies that the matching mode must start with the target string.
- The "$" operator specifies that the matching mode must appear at the end of the target object.
- The "\ B" Locator specifies that the matching mode must appear at the beginning or end of the target string.
- The "\ B" Locator specifies that the matched object must be within the boundary of the start and end of the target string,
- That is, the matched object cannot start or end with the target string.
Similarly, we can regard "^" and "$" as well as "\ B" and "\ B" as two sets of operators for inverse operation. For example:/^ hell/because the above regular expression contains the "^" locator, you can use "hell" with the target object ", the string starting with "hello" or "Hellhound" matches. /AR $/because the regular expression above contains the "$" operator, it can match the string ending with "car", "bar", or "Ar" in the target object. /\ Bbom/because the above regular expression pattern starts with "\ B", it can match a string starting with "bomb" or "Bom" in the target object. /Man \ B/because the above regular expression pattern ends with the "\ B" positioning character, you can use "human" with the target object ", the string ending with "woman" or "man" matches.
To make it easier for users to set matching modes flexibly, regular expressions allow users to specify a range in the matching mode, not limited to specific characters. For example:
Code
- /[A-Z]/the above regular expression will match any uppercase letter from A to Z.
- /[A-Z]/the above regular expression will match any lowercase letter in the range from A to Z.
- /[0-9]/the above regular expression will match any number from 0 to 9.
- /([A-Z] [A-Z] [0-9]) +/the above regular expression will be associated with any string consisting of letters and numbers, for example, "ab0" matches.
Note that you can use "()" in a regular expression to combine strings. The content contained by the "()" symbol must appear in the target object at the same time. Therefore, the above regular expression cannot match strings such as "ABC", because the last character in "ABC" is a letter rather than a number.
If we want to implement the "or" operation similar to the programming logic in the regular expression, and select one of multiple different modes for matching, we can use the pipe character "| ". For example:/to | too | 2/the above regular expression will match "to", "too", or "2" in the target object.
There is also a common operator in the regular expression, that is, the negative character "[^]". Unlike the positioning character "^" described above, the "[^]" negation specifies that the target object cannot contain strings specified in the pattern. For example:/[^ A-C]/the above string will match any character except A, B, and C in the target object. In general, when "^" appears in "[]", it is regarded as a negative operator. When "^" is located outside of "[]" or, it should be regarded as a positioning character.
Finally, you can use the Escape Character "\" to add metacharacters to the regular expression mode and find matching objects. For example, the/Th \ */regular expression will match the "th *" in the target object rather than ".
After constructing a regular expression, you can evaluate the value like a mathematical expression, that is, you can evaluate the value from left to right in a priority order. The priority is as follows:
Code
- 1. \ Escape Character
- 2 .(),(? :),(? =), [] Parentheses and square brackets
- 3. *, + ,?, {N}, {n ,}, {n, m} qualifier
- 4. ^, $, \ anymetacharacter location and Sequence
- 5. | "or" Operation
Use instance
Javascript 1.2 contains a powerful Regexp () object that can be used to match regular expressions. The test () method can check whether the target object contains the matching mode and return true or false accordingly.
We can use JavaScript to write the following script to verify the validity of the email address entered by the user.
Code
- <HTML>
- <Head>
- <Script language = "javascript1.2">
- <! -- Start hiding
- Function verifyaddress (OBJ)
- {
- VaR email = obj. Email. value;
- VaR pattern =
- /^ ([A-zA-Z0-9 _-]) + @ ([a-zA-Z0-9 _-]) + (\. [a-zA-Z0-9 _-]) + /;
- Flag = pattern. Test (email );
- If (FLAG)
- {
- Alert ("your email address is correct !");
- Return true;
- }
- Else
- {
- Alert ("Please try again !");
- Return false;
- }
- }
- // Stop hiding -->
- </SCRIPT>
- </Head>
- <Body>
- <Form onsubmit = "Return verifyaddress (this);">
- <Input name = "email" type = "text">
- <Input type = "Submit">
- </Form>
- </Body>
- </Html>
Regular Expression object
This object contains the regular expression mode and a flag indicating how to apply the mode.
Code
- Syntax 1 Re =/pattern/[flags]
- Syntax 2 Re = new Regexp ("pattern", ["Flags"])
Parameters
Re
Required. Name of the variable to be assigned a value in the regular expression mode.
Pattern
Required. The regular expression mode to use. If Syntax 1 is used, use the "/" character separation mode. If syntax 2 is used, quotation marks are used to mark the format.
Flags
Optional. If syntax 2 is used, the flag is enclosed in quotation marks. The flag can be used in combination and available include:
Code
- G (search for all occurrences of the pattern in full text)
- I (Case Insensitive)
- M (multi-row search)
Example
The following example creates an object (re) that contains the Regular Expression Pattern and related signs to demonstrate the usage of the regular expression object. In this example, the regular expression object used as the result is also used in the match method:
Code
- Function matchdemo ()
- {
- VaR R, RE; // declare the variable.
- VaR S = "The rain in Spain falls mainly in the plain ";
- Re = new Regexp ("Ain", "G"); // create a regular expression object.
- R = S. Match (re); // search for matching in string S.
- Return (R );
- }
Return Value: Ain, Ain \\
Attribute lastindex attribute | source attribute \\
Method compile method | exec method | Test Method \\
Required version 3 \\
See Regexp object | regular expression syntax | String object \\
Exec Method
Run the search in the string in regular expression mode and return an array containing the search result.
Rgexp.exe C (STR)
Parameters
Rgexp
Required. A regular expression object that contains the regular expression mode and available flag.
Str
Required. The string object or string text to be searched.
Description \\
If the exec method does not find a match, it returns NULL. If it finds a match, the exec method returns an array and updates the attributes of the global Regexp object to reflect the matching result. The 0 element of the array contains the complete match, and the 1st to nelement contains any child match in the match. This is equivalent to the match method without setting the global flag.
If a global flag is set for the regular expression, exec searches for it at the position indicated by the value of lastindex. If the global flag is not set, exec ignores the value of lastindex and searches from the starting position of the string.
The array returned by the exec method has three attributes: input, index, and lastindex. The INPUT attribute contains the entire searched string. The index attribute contains the position of the matched substring in the searched string. The lastindex attribute contains the next position of the last character in the match.
Example \\
The following example illustrates the usage of the exec method:
Code
- Function regexptest ()
- {
- VaR ver = Number (scriptenginemajorversion () + "." + scriptengineminorversion ())
- If (ver> = 5.5) {// test the JScript version.
- VaR src = "The rain in Spain falls mainly in the plain .";
- VaR Re =/\ W +/g; // create the regular expression mode.
- VaR arr;
- While (ARR = re.exe C (SRC ))! = NULL)
- Document. Write (ARR. index + "-" + arr. lastindex + arr + "\ t ");
- }
- Else {
- Alert ("use the updated version of JScript ");
- }
- }
Returned value: 0-3the 4-8rain 9-11in 12-17spain 18-23falls 24-30mainly 31-33in 34-37the 38-43plain
Test Method \\
Returns a Boolean value indicating whether the pattern exists in the searched string.
Rgexp. Test (STR)
Parameter \\
Rgexp
Required. A regular expression object that contains the regular expression mode or available flag.
Str
Required. The string to be searched.
Description
The test method checks whether a mode exists in the string. If yes, true is returned. Otherwise, false is returned.
The attributes of the global Regexp object cannot be modified by the test method.
Example
The following example illustrates the usage of the test method:
Code
- Function testdemo (Re, S)
- {
- VaR S1; // declare the variable.
- // Check whether a regular expression exists in the string.
- If (Re. test (s) // test whether the object exists.
- S1 = "contains"; // s inclusion mode.
- Else
- S1 = "does not contain"; // s does not contain the mode.
- Return ("'" + S + "'" + S1 + "'" + RE. Source + "'"); // return a string.
- }
Function call: Document. Write (testdemo (/Ain +/, "The rain in Spain falls mainly in the plain ."));
Returned value: 'The rain in Spain falls mainly in the plain. 'containin' ain +'
Match Method
Use the regular expression mode to perform a query on the string and return the result containing the query as an array. \\
Stringobj. Match (rgexp)
Parameter \\
Stringobj
Required. String object or string text to search.
Rgexp
Required. It is a regular expression object that contains the regular expression mode and available flag. It can also be a variable name or string text that contains the regular expression mode and available signs.
Description \\
If no match is found in the match method, null is returned. If a match is found, an array is returned and the attributes of the global Regexp object are updated to reflect the matching result.
The array returned by the match method has three attributes: input, index, and lastindex. The INPUT attribute contains the entire searched string. The index attribute contains the position of the matched substring in the entire searched string. The lastindex attribute contains the next position of the last character in the last match.
If the global sign (G) is not set, the 0 element of the array contains the entire match, and the 1st to N element contains any child match that has occurred during the match. This is equivalent to the exec method without a global flag. If a global flag is set, elements 0 to N contain all matches.
Example \\
The following example demonstrates the usage of the match method:
Code
- Function matchdemo ()
- {
- VaR R, RE; // declare the variable.
- VaR S = "The rain in Spain falls mainly in the plain ";
- Re =/Ain/I; // create the regular expression mode.
- R = S. Match (re); // try to match the search string.
- Return (r); // return the place where "Ain" appears for the first time.
- }
Return Value: Ain
This example describes how to use the match method with G flag settings.
Code
- Function matchdemo ()
- {
- VaR R, RE; // declare the variable.
- VaR S = "The rain in Spain falls mainly in the plain ";
- Re =/Ain/ig; // create the regular expression mode.
- R = S. Match (re); // try to match the search string.
- Return (r); // The returned array contains all "Ain"
- // The four matching results.
- }
Return Value: Ain, Ain
The above lines of code demonstrate the usage of the Match Method for string text.
Code
- VaR R, Re = "Spain ";
- R = "The rain in Spain". Replace (Re, "Canada ");
- Return R;
Return Value: The rain in Canada
Search Method
Returns the position of the first substring that matches the regular expression.
Stringobj. Search (rgexp)
Parameter \\
Stringobj
Required. String object or string text to be searched.
Rgexp
Required. A regular expression object that contains the regular expression mode and available flag.
Description
The search method specifies whether a matching exists. If a match is found, the search method returns an integer indicating the offset position starting from the string. If no match is found,-1 is returned.
Example \\
The following example demonstrates how to use the search method.
Code
- Function searchdemo ()
- {
- VaR R, RE; // declare the variable.
- VaR S = "The rain in Spain falls mainly in the plain .";
- Re =/falls/I; // create the regular expression mode.
- R = S. Search (re); // search for strings.
- Return (r); // return the boolean result.
- }
Returned value: 18 # JavaScript/ajax Column