Regular Expression
If you have never used a regular expression, you may not be familiar with this term or concept. However, they are not as novel as you think.
Remember how to find files on the hard disk. Are you sure you want to use? And * characters to help find the file you are looking .? Character matches a single character in the file name, while * matches one or more characters. A file such as 'data ?. The following files can be found in the DAT mode:
Data1.dat
Data2.dat
Datax. dat
Datan. dat
If "*" is used instead? Characters to expand the number of files found. 'Data *. dat 'can match all the following file names:
Data. dat
Data1.dat
Data2.dat
Data12.dat
Datax. dat
Dataxyz. dat
Although this file search method is certainly very useful, it is also very limited .? The limited capabilities of wildcard and * enable you to define what a regular expression can do. However, regular expressions are more powerful and flexible.
Early Origins
The "Ancestor" of regular expressions can be traced back to early studies on how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, developed a mathematical method to describe these neural networks.
In 1956, a mathematician named Stephen Kleene published a paper titled "neural network event representation" based on McCulloch and Pitts's early work, introduces the concept of regular expressions. A regular expression is an expression used to describe the algebra of a positive set. Therefore, the regular expression is used.
Later, we found that this work can be applied to the computing search using Ken Thompson.AlgorithmIn some early studies, Ken Thompson was the principal inventor of UNIX. The first practical application of Regular ExpressionsProgramIs the QED editor in UNIX.
As they said, the rest is the well-known history. Since then, regular expressions have been an important part of text-based editors and search tools.
Use Regular Expressions
In typical search and replacement operations, you must provide the exact text to be searched. This technology may be sufficient for simple search and replacement tasks in static text, but it is difficult or even impossible to search dynamic text due to its lack of flexibility.
With a regular expression, you can:
Test a mode of a string. For example, you can test an input string to see if there is a phone number or a credit card number. This is called Data Validity verification.
Replace text. You can use a regular expression in a document to identify a specific text, and then delete it all or replace it with another text.
Extract a substring from the string based on the pattern match. It can be used to search for specific text in text or input fields.
For example, if you need to search the entire web site to delete outdated materials and replace some HTML formatting tags, you can use a regular expression to test each file, check whether there are materials or HTML formatting tags in the file. With this method, you can narrow down the affected files to the files that contain the materials to be deleted or changed. Then, you can use a regular expression to delete outdated materials. Finally, you can use a regular expression to find and replace the tags that need to be replaced.
Another example that describes the usefulness of regular expressions is a language with unknown string processing capabilities. VBScript is a subset of Visual Basic and has rich string processing functions. Similar to C, JScript does not have this capability. Regular Expressions significantly improve the string processing capability of JScript. However, it may be more efficient to use regular expressions in VBScript. It allows multiple string operations in a single expression.
A regular expression is a text mode consisting of common characters (such as characters A to Z) and special characters (such as metacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string.
Here are some examples of regular expressions that may be encountered:
JScript VBScript matching
/^ \ [\ T] * $/"^ \ [\ t] * $" matches a blank row.
/\ D {2}-\ D {5}/"\ D {2}-\ D {5}" verify that an ID number consists of two digits, A hyphen and a five-digit combination.
/<(. *)>. * <\/\ 1>/"<(. *)>. * <\/\ 1>" matches an HTML Tag.
The following table shows a complete list of metacharacters and their behaviors in the context of a regular expression:
Character Description
\ Mark the next character as a special character, an original character, or a backward reference, or an octal escape character. For example, 'n' matches the character "N ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. If the multiline attribute of the Regexp object is set, ^ matches the position after '\ n' or' \ R.
$ Matches the end position of the input string. If the multiline attribute of the Regexp object is set, $ also matches the position before '\ n' or' \ R.
* Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least N times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least N times and at most m times. For example, "O {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "O", and 'O +' will match all 'O '.
. Match any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) matches pattern and obtains this match. The obtained match can be obtained from the generated matches set.
Use the submatches set in VBScript and $0… In JScript... $9 attribute. To match the parentheses, use '\ (' or '\)'.
(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? = Pattern) Forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?! Pattern) negative pre-query: matches the search string at the beginning of any string that does not match pattern. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X | y matches X or Y. For example, 'z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[ABC]' can match 'A' in "plain '.
[^ XYZ] combination of negative character sets. Match any character not included. For example, '[^ ABC]' can match 'p' in "plain '.
[A-Z] character range. Matches any character in the specified range. For example, '[A-Z]' can match any lowercase letter in the range of 'A' to 'Z.
[^ A-Z] negative character range. Matches any character that is not within the specified range. For example, '[^ A-Z]' can match any character that is not in the range of 'A' to 'Z.
\ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ CX matches the control characters specified by X. For example, \ cm matches a control-M or carriage return character. The value of X must be a A-Z or
A-Z. Otherwise, C is treated as an original 'C' character.
\ D matches a numeric character. It is equivalent to [0-9].
\ D matches a non-numeric character. It is equivalent to [^ 0-9].
\ F matches a break. It is equivalent to \ x0c and \ Cl.
\ N matches a linefeed. It is equivalent to \ x0a and \ CJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cm.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ CI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ ck.
\ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ XN matches n, where N is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..
\ Num matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\ N identifies an octal escape value or a backward reference. If at least N subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
\ Nm identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the NM is backward referenced. If at least N records are obtained before \ nm, n is a backward reference followed by text M. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm.
\ NML if n is an octal digit (0-3) and both M and l are octal digits (0-7), the octal escape value NML is matched.
\ UN matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00a9 matches the copyright symbol (?).
Create a regular expression
The method for constructing a regular expression is the same as that for creating a mathematical expression. That is, a larger expression is created by combining a small expression with a variety of metacharacters and operators.
You can construct a regular expression by placing various components in expression mode between a pair of delimiters. For JScript, The Delimiter is a pair of forward slash (/) characters. For example:
/Expression/
For VBScript, a pair of quotation marks ("") are used to determine the boundary of the regular expression. For example:
"Expression"
In the two examples shown above, the regular expression mode (expression) is stored in the pattern attribute of the Regexp object.
The regular expression component can be a single character, Character Set combination, character range, choice between characters, or any combination of all these components.
Priority Order
After constructing a regular expression, you can evaluate the value like a mathematical expression, that is, you can evaluate the value from left to right in a priority order.
The following table lists the priority orders of various regular expression operators from the highest priority to the lowest priority:
Operator description
\ Escape Character
(),(? :),(? =), [] Parentheses and square brackets
*, + ,?, {N}, {n ,}, {n, m} qualifier
^, $, \ Anymetacharacter location and Sequence
| "Or" Operation
Common characters
A common character consists of all the print and non-print characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letter characters, all numbers, all punctuation marks, and some symbols.
The simplest regular expression is a single normal character that can match the character itself in the searched string. For example, the single-Character Mode 'A' can match the 'A' letter that appears at any position in the searched string '. Here are some examples of Single-character regular expression patterns:
//
/7/
/M/
The equivalent single-character Regular Expression of VBScript is:
""
"7"
"M"
You can combine multiple single characters to obtain a large expression. For example, the following JScript regular expression is an expression created by combining a single character expression 'A', '7', and 'M.
/A7m/
The equivalent VBScript expression is:
"A7m"
Note that there is no join operator here. All you need to do is place one character after the other.
Special characters
Many metacharacters require special processing when trying to match them. To match these special characters, you must first escape these characters, that is, use a backslash (\). The following table lists the special characters and their meanings:
Special characters
$ Matches the end position of the input string. If the multiline attribute of the Regexp object is set, $ also matches '\ n' or' \ R '. To match the $ character, use \ $.
() Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use \ (and \).
* Matches the previous subexpression zero or multiple times. To match * characters, use \*.
+ Match the previous subexpression once or multiple times. To match + characters, use \ +.
. Match any single character except linefeed \ n. To match., use \.
[Mark the start of a bracket expression. To match [, use \[.
? Match the previous subexpression zero or once, or specify a non-Greedy qualifier. To match? Character, use \?.
\ Mark the next character as a special character, or a literal character, or backward reference, or an octal escape character. For example, 'n' matches the character 'n '. '\ N' matches the line break. The sequence '\' matches "\", while '\ (' matches "(".
^ Matches the start position of the input string. Unless used in the square brackets expression, this character set is not accepted. To match
^ Character, use \ ^.
{Mark the start of the qualifier expression. To match {, use \{.
| Specify an option between the two items. To match |, use \ |.
Non-printable characters
There are many useful non-printable characters that must be used occasionally. The following table lists escape sequences used to indicate non-printable characters:
Character meaning
\ CX matches the control characters specified by X. For example, \ cm matches a control-M or carriage return character. The value of X must be either a A-Z or a-Z. Otherwise, C is treated as an original 'C' character.
\ F matches a break. It is equivalent to \ x0c and \ Cl.
\ N matches a linefeed. It is equivalent to \ x0a and \ CJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cm.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T matches a tab. It is equivalent to \ x09 and \ CI.
\ V matches a vertical tab. It is equivalent to \ x0b and \ ck.
Character matching
The period (.) matches any single print or non-print character in a string, except the line break (\ n. The following JScript regular expressions can match 'aac ', 'abc', 'Acc', and 'adc, you can also match 'a1c ', 'a2c', a-C', and a # C ':
/A. C/
The equivalent VBScript regular expression is: