Http://blog.csdn.net/yan11cn/article/details/5004279
Refer to the blog of Tom
Regexp objects are the objects used in VBScript to provide simple regular expressions. All attributes and methods related to regular expressions in VBScript are associated with this object.
Dim re
Set Re = new Regexp
This object has three attributes and three methods, as shown in table 9-1.
Table 9-1
Genus |
Global attributes |
Ignorecase attributes |
Pattern attributes |
Method |
Execute Method |
Replace Method |
Test Method |
The following sections will introduce these attributes and methods in depth. It also describes the regular expression symbols that you will use in the mode.
1 Global attributes
The global attribute is responsible for setting or returning a Boolean value, indicating whether the pattern matches all the matching places in the entire string or only the places that appear for the first time (see table 9-2 ).
Table 9-2
Code |
Object. Global [= value] |
Object |
Regexp object |
Value |
There are two possible values: true and false. |
|
If the value of the global attribute is true, the entire string will be searched; otherwise, no. The default value is false -- not true in some Microsoft documents |
The following example uses the Global attribute to ensure that all "in" is modified.
Dim re, S
Set Re = new Regexp
Re. pattern = "/bin"
Re. Global = true
S = "The rain in Spain falls mainly on the plains ."
Msgbox re. Replace (S, "in the country ")
2 ignorecase attributes
The ignorecase attribute is used to set or return a Boolean value, indicating whether the mode matching is case sensitive (see table 9-3 ).
Table 9-3
Code |
Object. ignorecase [= value] |
Object |
Regexp object |
Value |
There are two possible values: true and false. |
|
If the value of the ignorecase attribute is false, the search is case sensitive. If it is true, it is not. The default value is false -- not true in some Microsoft documents |
Continue to look at this example. We have read the global Attribute before. If the string to be matched contains "in", we must tell VBScript to ignore the case when matching.
Dim re, S
Set Re = new Regexp
Re. pattern = "/bin"
Re. Global = true
Re. ignorecase = true
S = "The rain in Spain falls mainly on the plains ."
Msgbox re. Replace (S, "in the country ")
3. Pattern attributes
Set the pattern attribute or return the regular expression used for search (see table 9-4 ).
All the preceding examples use pattern.
Dim re, S
Set Re = new Regexp
Re. pattern = "/bin"
Re. Global = true
S = "The rain in Spain falls mainly on the plains ."
Msgbox re. Replace (S, "in the country ")
Table 9-4
Code |
Object. Pattern [= "searchstring"] |
Object |
Regexp object |
SEARCH strings |
The regular string expression to be searched. It may contain some regular expression characters -- Optional |
4. Regular Expression characters
The strength of regular expressions does not come from the use of strings as the mode, but rather the use of special characters in the mode. Table 9-5 lists all these characters and their roles in the code.
The upper-case special characters are opposite to the lower-case special characters.
Table 9-5
Character |
Description |
/ |
Indicates that the next character is a special character or a literal constant. |
^ |
Match the beginning of the Input |
$ |
Match the end of input |
* |
Matches the previous character Zero or multiple times |
+ |
Match the previous character once or multiple times |
? |
Match the first character Zero or once |
. |
Match any single character except line breaks |
(Pattern) |
Match and remember this pattern. [0]… [N] obtain the matched string from the matches set of the result. To match the brackets themselves, add a slash to the front -- use "/(" or "/)" |
(? : Pattern) |
Match but not capture mode, that is, the matching results are not stored for future use. This can be used to use different parts of the "or" character (|) Merge mode. For example, "anomal (? : Y | ies) "is much more cost-effective than" anomaly | anomalies" |
(? = Pattern) |
When the string to be searched matches the open header of the pattern, this part is matched. This is a non-capturing match, that is, the matching results will not be saved for future use. For example, "windows (? = 95 | 98 | nt | 2000 | XP | Vista) "matches windows in" Windows Vista ", but does not match windows in" Windows 3.1" |
(?! Pattern) |
In contrast to the previous one, this matches the content that does not appear in the pattern. This is a non-capturing match, that is, the matching results will not be saved for future use. For example, "windows (? = 95 | 98 | nt | 2000 | XP | Vista) "matches windows in" Windows 3.1 ", but does not match windows in" Windows Vista" |
X | y |
Match X or Y |
(Continued table)
Character |
Description |
{N} |
Exact match n times (N must be a non-negative integer) |
{N ,} |
Match at least N times (N must be a non-negative integer -- note the ending comma) |
{N, m} |
Match at least N times and at most m times (both M and N must be non-negative integers) |
[Xyz] |
Match any of these characters (XYZ indicates a character set) |
[^ XYZ] |
Match any character that is not included (^ XYZ indicates the complement set of a character set) |
[A-Z] |
Match characters in the specified range (a-Z indicates the range of characters) |
[M-Z] |
Match characters outside the specified range (^ m-Z indicates the completion set of the specified range) |
/B |
Match A Word boundary, which is located between the word and space. |
/B |
Match a non-word boundary |
/D |
Match a number. Equivalent to [0-9] |
/D |
Match non-numbers. Equivalent to [^ 0-9] |
/F |
Match a newline |
/N |
Match line breaks |
/R |
Match carriage return |
/S |
Matches blank spaces, including spaces, tabs, and page breaks. It is equivalent to "[/f/n/R/T/V]" |
/S |
Matches non-blank characters. It is equivalent to "[^/f/n/R/T/V]" |
/T |
Match tabs |
/V |
Match vertical tabs |
/W |
Match letters, numbers, and underscores. Equivalent to "[A-Za-z0-9 _]" |
/W |
Match non-character numbers. Equivalent to "[^ A-Za-z0-9/_]" |
/. |
Match. |
/| |
Match | |
/{ |
Match { |
/} |
Match} |
// |
Match/ |
/[ |
Match [ |
/] |
Match] |
/( |
Match ( |
/) |
Matching) |
$ Num |
Matches num, where num is a positive integer. Returns a reference to the matching result. |
/N |
Match n, where n is an octal escape character. The length of the octal escape character must be 1, 2, or 3. |
/Uxxxx |
Match ASCII characters in Unicode form |
/XN |
Match n, where n is a hexadecimal escape character. The hexadecimal escape character must be two characters in length. |
Many of the codes do not need to be described too much, but some examples may need help from others to understand them.
Match a class of Characters
You have seen a simple mode:
Re. pattern = "in"
It is usually used to match a class of characters. By placing the characters to be matched in square brackets. For example, the following example replaces a single number with a more general term.
Dim re, S
Set Re = new Regexp
Re. pattern = "[1, 23456789]"
S = "Spain received 3 millimeters of rain last week ."
Msgbox re. Replace (S, "success ")
The output of this Code is as follows.
Figure 9-11
In this example, the number "3" is replaced with the text "success ". As you expected, you can specify a range to shorten this mode. This mode has the same functionality as the previous one.
Dim re, S
Set Re = new Regexp
Re. pattern = "[2-9]"
S = "Spain received 3 millimeters of rain last week ."
Msgbox re. Replace (S, "success ")
Replace numbers and non-Numbers
You often need to replace numbers. In fact, because the mode [0-9] (including all numbers) is often used, there is an equivalent shortcut for [0-9]:/d.
Dim re, S
Set Re = new Regexp
Re. pattern = "/D"
S = "a B C D E F 1G 2 h... 10 Z"
Msgbox re. Replace (S, "a number ")
The string after replacement is 9-12.
Figure 9-12
What if I want to match non-numeric characters? Use the ^ symbol in square brackets.
The meaning of using ^ outside square brackets is completely different and will be discussed later.
In this way, the following pattern can be used to match non-numeric characters:
Re. pattern = "[^, 0-9]" the hard way
Re. pattern = "[^/d]" 'a little shorter
Re. pattern = "[/d]" 'Another of those special characters
The last mode uses another special character. In most cases, this special character only reduces the number of inputs (or an effective memory), but in some cases, for example, it is useful to match tabs and other characters that cannot be printed.
Anchoring and shortening Modes
There are three special characters used for the anchoring mode. They do not match any characters, but it can be required that the other mode must appear at the beginning of the input (use ^ outside of []) and end of the input ($) or word boundary (/B you have already seen ).
Another method to reduce the number of duplicates is to use the number of duplicates. The basic idea is to specify the number of repetitions after the pattern. For example, the following pattern, as shown in 9-13, can match multiple numbers and replace them.
Dim re, S
Set Re = new Regexp
Re. pattern = "/d {3 }"
S = "Spain received 100 millimeters of rain in the last 2 weeks ."
Msgbox re. Replace (S, "a whopping number ")
Figure 9-13
If the number of duplicates is not used in the code, as shown in 9-14, it will leave "00" in the last string ".
Figure 9-14
Dim re, S
Set Re = new Regexp
Re. pattern = "/D"
S = "Spain received 100 millimeters of rain in the last 2 weeks ./"
Msgbox re. Replace (S, "a whopping number ")
Note that RE. Global = true cannot be used here, because four "a whopping number of" will be generated in the result ". The result is 9-15.
Figure 9-15
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "/D"
S = "Spain received 100 millimeters of rain in the last 2 weeks ."
Msgbox re. Replace (S, "a whopping number ")
Specify the matching range or minimum number of times
You can specify the minimum number of matching times {min} Or the range {min, Max ,}. Some of the frequently used duplicate modes also have special shortcuts.
Re. pattern = "/d +" one or more digits,/d {1 ,}
Re. pattern = "/D *" 'zero or more digits,/d {0 ,}
Re. pattern = "/D? "'Optional: zero or one,/d {0, 1}
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "/d +"
S = "Spain received 100 millimeters of rain in the last 2 weeks ."
Msgbox re. Replace (S, "a number ")
The output of this Code is 9-16. Note that the string "100" is replaced.
Figure 9-16
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "/D *"
S = "Spain received 100 millimeters of rain in the last 2 weeks ."
Msgbox re. Replace (S, "a number ")
The output of the above Code is 9-17. This string is inserted between two non-numeric characters, and the number is replaced.
Figure 9-17
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "/D? "
S = "Spain received 100 millimeters of rain in the last 2 weeks ."
Msgbox re. Replace (S, "a number ")
The output of the above Code is 9-18. "A number" is inserted between two non-numeric characters, while the number is replaced.
Figure 9-18
Remember matching results
The last special character to be discussed is to remember the matching results. If you want to use partial or all of the matching results in the text to be replaced, this is useful-For details, see the replace method. One example uses the matching results in mind.
To verify this, and to bring together all discussions about special characters, let's do something practical. Search for a string and find the URL. To control the complexity and scale of this example, we only look for the "http:" protocol, but you can also handle various DNS domain names, including unlimited domain name levels. Don't worry about how to communicate with DNS. You only need to know that it is enough to enter a URL in the browser.
The code for the method of another Regexp object in the next section contains more details. Now, you only need to know that execute will execute a pattern match and return each matching result through the set. Here is the code:
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "http: // (/W + [/W-] */W +/.) */W +"
S = "http://www.kingsley-hughes.com is a valid Web address. And so is"
S = S & vbcrlf & "http://www.wrox.com. And"
S = S & vbcrlf & "http://www.pc.ibm.com-even with 4 levels ."
Set colmatches = Re. Execute (s)
For each match in colmatches
Msgbox "found valid URL:" & Match. Value
Next
As you wish, the main task is to set the code line of the mode. It seems a little daunting, but it is easy to understand. Let's break it down:
1. The mode starts with a fixed string http. Then enclose the main part of the pattern with parentheses. The highlighted mode below matches a level-1 DNS, including the vertices at the end:
Re. pattern = "http: // (/W [/W-] */W/.) */W +"
This pattern starts with a special character/w you have seen before, used to match [a-zA-Z0-9], that is, all the numbers and letters in English.
2. use parentheses to match letters, numbers, or horizontal bars, because there can be horizontal bars in DNS. Why not use the same pattern as above? It's easy, because effective DNS cannot start or end with a horizontal bar. Then, use * to repeat 0 or more characters.
Re. pattern = "http: // (/W [/W-] */W/... */W +"
3. Then strictly use letters and numbers so that the domain name will not end with a horizontal bar. The last pattern match in the brackets is used to split the points (.) of the DNS hierarchy (.).
Vertices cannot be used separately, because they are special characters. Normally, they can match any character except the linefeed. You can use a backslash to escape this character.
4. After encapsulating these things into parentheses, you only need to continue using the * repeat mode. Therefore, the highlighted mode below can match all valid domain names and their subsequent points. In other words, it can match the level-1 domain name in the entire DNS.
Re. pattern = "http: // (/W [/W-] */W/.) */W +"
5. The final mode is one or more characters required by top-level domain names (such as COM, org, and edu.
Re. pattern = "http: // (/W [/W-] */W/.) */W +"
9.2.5 execute Method
This method applies the regular expression to the string and returns the matches set. This is the startup switch that uses the pattern matching string in the Code. For more information, see table 9-6.
Table 9-6
Code |
Object. Execute (string) |
Object |
It can only be a Regexp object |
String |
String to be searched-required |
The pattern attribute of the Regexp object is used for regular expression search.
Dim re, S
Set Re = new Regexp
Re. Global = true
Re. pattern = "http: // (/W + [/W-] */W +/.) */W +"
S = "http://www.kingsley-hughes.com is a valid Web address. And so is"
S = S & vbcrlf & "http://www.wrox.com. And" s = S & vbcrlf &
"Http://www.pc.ibm.com-even with 4 levels ."
Set colmatches = Re. Execute (s)
For each match in colmatches
Msgbox "found valid URL:" & Match. Value
Next
Note that some languages have different processing methods for the results of regular expressions. Execute returns a Boolean value that determines whether the mode is found. Due to this difference, you will often see that the regular expressions converted from other languages cannot be used in VBScript.
Some Microsoft documents contain such errors, but most of them have been corrected.
Remember that the result of execute is a set, or even an empty set. You can use if Re. Execute (s). Count = 0 or the test method specifically designed for this purpose to test it.
6. Replace Method
This method is used to replace the text found in regular expression search. For more information, see table 9-7.
Table 9-7
Code |
Object. Replace (string1, string2) |
Object |
It can only be a Regexp object |
String 1 |
This is a replacement text string-required |
String 2 |
This is a replacement text string-required |
The replace method returns a copy of string1 after Regexp. pattern is replaced by string2. If no matching occurs in the string, string1 is returned without any change.
Dim re, S
Set Re = new Regexp
Re. pattern = "http: // (/W + [/W-] */W +/.) */W +"
S = "http://www.kingsley-hughes.com is a valid Web address. And so is"
S = S & vbcrlf & "http://www.wrox.com. And"
S = S & vbcrlf & "http://www.pc.ibm.com-even with 4 levels ."
Msgbox re. Replace (S, "** top secret! **")
The output of the above Code is 9-19.
Figure 9-19
The replace method can also replace the subexpression in the mode. This requires special characters such as $1 and $2 in the text to be replaced. These "Parameters" are the matching results that are remembered.
7 backreferencing
A remembered matching result is part of the pattern. This is the so-called backreferencing. You must use parentheses to specify the part to be stored in the temporary cache. Each captured matching result is stored in the order of matching (left to right in regular expression mode ). The cache that stores the matching results starts from 1. The maximum value is 99. You can access them with variables such as $1 and $2 in sequence.
Non-capturing metacharacters ("? :","? = "Or "?! ") Skip some parts of the regular expression.
In the following example, the first five words (consisting of one or more non-blank characters) will be remembered, and only four of them will appear in the replacement text:
Dim re, S
Set Re = new Regexp
Re. pattern = "(/S +)/S + (/S +)"
S = "VBScript is not very cool ."
Msgbox re. Replace (S, "$1 $2 $4 $5 ")
The output of this Code is 9-20.
Figure 9-20
Note that in this code, a pair (/S +)/S + is added for each Pronoun in the string. This allows the code to better control the strings to be processed. You can prevent the tail of a string from being added to the string to be displayed. Make sure that the output meets your requirements when using backreferencing!
8. Test Method
The test method performs regular expression search on the string and returns a Boolean value indicating whether the matching is successful. See table 9-8.
Table 9-8
Code |
Object. Test (string) |
Object |
Regexp object |
String |
Execution object for regular expression search-required |
If the match succeeds, the test method returns true; otherwise, false. This applies to determining whether a string contains a certain pattern. Note that you often need to set the mode to case sensitive, as shown in the following example:
Dim re, S
Set Re = new Regexp
Re. ignorecase = true
Re. pattern = "http: // (/W + [/W-] */W +/.) */W +"
S = "some long string with http://www.wrox.com buried in it ."
If Re. test (s) then
Msgbox "found a URL ."
Else
Msgbox "No URL found ."
End if
The output of this Code is 9-21.