Microsoft | regular back Reference
One of the most important features of regular expressions is the ability to store portions of a successful pattern for later use. Recall that adding parentheses around a regular expression pattern or part of a pattern causes this part of the expression to be stored in a temporary buffer. You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of this part of the regular expression.
Each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. The buffer number for the storage child match starts at 1 and is numbered consecutively up to 99 subexpression. Each buffer can be accessed using ' \ n ', where n is a single or two-bit decimal number that identifies a particular buffer.
The simplest and most useful application of the post reference is to provide the ability to determine the position of two consecutive words in the text. Take a look at the following sentence:
Gasoline going up?
According to the written content, the above sentence obviously has the problem that the word repeats many times. It would be nice if there was a way to modify the sentence without having to look up the repetition of each word. The following Visual Basic scripting Edition Regular expressions use a subexpression to implement this functionality.
/\b ([a-z]+) \1\b/gi
The equivalent VBScript expression is:
"\b ([a-z]+) \1\b"
In this example, the subexpression is each item between the parentheses. The captured expression includes one or more alphabetic characters, which are specified by ' [a-z]+ '. The second part of the regular expression is a reference to the previously captured substring, the second occurrence of a word that is matched by an additional expression. ' \1 ' is used to specify the first child match. The word boundary Meta character ensures that only individual words are detected. If not, phrases such as "is issued" or "This is" are incorrectly recognized by the expression.
In the Visual Basic scripting Edition expression, the global flag (' G ') following the regular expression indicates that the expression will be used to find as many matches as possible in the input string. Case sensitivity is specified by the case sensitivity mark (' I ') at the end of the expression. A multiline tag specifies a potential match that may appear at both ends of a newline character. For VBScript, various tags cannot be set in an expression, but must be explicitly set using the properties of the RegExp object.
Using the regular expression shown above, the following Visual Basic scripting Edition code can use the child matching information to replace the same word in a literal string with a two consecutive occurrences of the same word:
var ss = ' is ' The cost of ' gasoline going up?. \ n ";
var re =/\b ([a-z]+) \1\b/gim; Creates a regular expression style.
var rv = Ss.replace (Re, "$"); Replace two words with one word.
The closest equivalent VBScript code is as follows:
Dim SS, Re, RV
SS = ' is ' The cost of ' gasoline going up? ' & vbNewLine
Set re = New RegExp
Re. Pattern = "\b ([a-z]+) \1\b"
Re. Global = True
Re. IgnoreCase = True
Re. MultiLine = True
RV = Re. Replace (SS, "$")
Note that in VBScript code, global, case sensitive, and multiline tags are set using the appropriate properties of the RegExp object.
Use $ in the Replace method to refer to the saved first child match. If you have more than one match, you can continue the reference with $, $, and so on.
Another use of a back reference is to decompose a generic resource indicator (URI) into the component part. Suppose you want to decompose the following URIs into protocols (FTP, HTTP, etc), domain address, and page/path:
Http://msdn.microsoft.com:80/scripting/default.htm
The following regular expression can provide this functionality. For Visual Basic scripting Edition, for:
/(\w+): \/\/([^/:]+) (: \d*)? ([^# ]*)/
For VBScript:
"(\w+): \/\/([^/:]+) (: \d*)? ([^# ]*)"
The first additional subexpression is the portion of the protocol used to capture the Web address. The subexpression matches any word that precedes a colon and two forward slashes. The second additional subexpression captures the domain name address of the address. The subexpression matches any sequence of characters that does not include the ' ^ ', '/' or ': ' character. The third additional subexpression captures the site port number, if specified. The subexpression matches 0 or more digits followed by a colon. Finally, the fourth additional subexpression captures the path specified by the Web address and \ or page information. The subexpression matches one and more characters except ' # ' or a space.
When the regular expression is applied to the URI shown above, the child match contains the following:
Regexp.$1 contains "http"
Regexp.$2 contains "msdn.microsoft.com"
Regexp.$3 contains ": 80"
Regexp.$4 contains "/scripting/default.htm"