Selection/marshalling and back references for regular expression tutorials

Source: Internet
Author: User
Tags continue regular expression port number
Selection and Grouping

Select Allow the ' | ' character to be selected in two or more candidates. By extending the regular expression of the chapter title, you can extend it to an expression that applies not only to chapter headings. However, this is not as direct as you might think. When you use a selection, the most likely expression on each side of the ' | ' character is matched. You might think that the following JScript and VBScript expressions match the ' Chapter ' or ' section ' at the beginning and end of a line and followed by one or two digits:

/^Chapter|Section [1-9][0-9]{0,1}$/ "^Chapter|Section [1-9][0-9]{0,1}___FCKpd___0quot;

Unfortunately, the real situation is that the regular expression shown above either matches the word ' Chapter ' at the beginning of a line or the ' section ' followed by any number at the end of a line. If the input string is ' Chapter 22 ', the expression above will only match the word ' Chapter '. If the input string is ' section 22 ', the expression will match ' section 22 '. But this result is not our purpose here, so there has to be a way to make regular expressions more responsive to what you want to do, and you do have this approach.

You can use parentheses to limit the range of selections, which means that the selection applies only to the two words ' Chapter ' and ' section '. However, parentheses are also difficult to handle, because they are also used to create subexpression, and some of the content is described later in the section on the subexpression. By taking the regular expression shown above and adding parentheses in place, you can make the regular expression match either ' Chapter 1 ' or ' section 3 '.

The following regular expression uses parentheses to form a group of ' Chapter ' and ' section ' so that the expression works correctly. For JScript:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

For VBScript:

"^(Chapter|Section) [1-9][0-9]{0,1}___FCKpd___2quot;

These expressions work correctly, but produce an interesting by-product. In ' chapter| Section ' placed parentheses on both sides establishes the appropriate grouping, but also causes one of the two matching words to be captured for future use. Because there is only one set of parentheses in the expression shown above, there can be only one captured Submatch. You can reference this child match using the Submatches collection of VBScript or the $1-$9 property of the RegExp object in JScript.

Sometimes capturing a child match is desirable and sometimes not desirable. In the example shown in the description, the real thing to do is to use parentheses to group the selection between the word ' Chapter ' or ' section '. You do not want to refer to the match later. In fact, do not use unless you really need to catch a child match. This regular expression is more efficient because it does not take time and memory to store those child matches.

You can use '?: ' in front of the regular expression pattern parenthesis to prevent the match from being stored for future use. The following modifications to the regular expression shown above provide the same functionality for exempting the child-matching store. For JScript:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

For VBScript:

"^(?:Chapter|Section) [1-9][0-9]{0,1}___FCKpd___4quot;

In addition to the '?: ' metacharacters, there are also two non-capture meta characters used to refer to the matching of the pre-search . A forward lookup, with a? =, matches the search string at any position that begins to match the regular expression pattern within the parentheses. A negative lookup, using '?! ', matches the search string at any position that does not begin to match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that you need to update the document by looking for all references to Windows 95, Windows 98, and Windows NT, and change those references to Windows 2000. You can use the following JScript regular expression, which is a forward check to match Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

To do the same thing in VBScript, you can use the following expression:"Windows(?=95 |98 |NT )"

When a match is found, the next matching search is initiated immediately after the matching text (not including the characters used in the search). For example, if the expression shown above matches the ' Windows 98 ', the lookup will continue after ' windows ' instead of ' 98 '.

Back Reference

One of the most important features of regular expressions is the ability to store portions of a successful pattern for later use. Recall that adding parentheses around a regular expression pattern or part of a pattern causes this part of the expression to be stored in a temporary buffer. You can use a non-capture meta character '?: ', '? = ', or '?! ' to ignore the preservation of this part of the regular expression.

Each captured child match is stored in the content that is encountered from left to right in the regular expression pattern. The buffer number for the storage child match starts at 1 and is numbered consecutively up to 99 subexpression. Each buffer can be accessed using ' \n', where n is a single or two-bit decimal number that identifies a particular buffer.

The simplest and most useful application of the post reference is to provide the ability to determine the position of two consecutive words in the text. Take a look at the following sentence:

Is is the cost of of gasoline going up up?

According to the written content, the above sentence obviously has the problem that the word repeats many times. It would be nice if there was a way to modify the sentence without having to look up the repetition of each word. The following JScript regular expression uses a subexpression to implement this functionality.

/\b([a-z]+) \1\b/gi

The equivalent VBScript expression is:

"\b([a-z]+) \1\b"

In this example, the subexpression is each item between the parentheses. The captured expression includes one or more alphabetic characters, which are specified by ' [a-z]+ '. The second part of the regular expression is a reference to the previously captured substring, the second occurrence of a word that is matched by an additional expression. ' \1 ' is used to specify the first child match. The word boundary Meta character ensures that only individual words are detected. If not, phrases such as "is issued" or "This is" are incorrectly recognized by the expression.

In a JScript expression, the global flag (' G ') that follows the regular expression indicates that the expression will be used to find as many matches as possible in the input string. Case sensitivity is specified by the case sensitivity mark (' I ') at the end of the expression. A multiline tag specifies a potential match that may appear at both ends of a newline character. For VBScript, various tags cannot be set in an expression, but must be explicitly set using the properties of the RegExp object.

Using the regular expression shown above, the following JScript code can use the child-matching information to replace the same word in a literal string with a single word that appears two consecutive times:

var ss = "Is is the cost of of gasoline going up up?.\n"; var re = /\b([a-z]+) \1\b/gim; //Create a regular expression style . var rv = ss.replace(re,"$1"); // replace two words with one word.

The closest equivalent VBScript code is as follows:

Dim ss, re, rv ss = "Is is the cost of of gasoline going up up?." & vbNewLine Set re = New RegExp re.Pattern = "\b([a-z]+) \1\b" re.Global = True re.IgnoreCase = True re.MultiLine = True rv = re.Replace(ss,"$1")

Note that in VBScript code, global, case sensitive, and multiline tags are set using the appropriate properties of the RegExp object.

Use $ in the Replace method to refer to the saved first child match. If you have more than one match, you can continue the reference with $, $, and so on.

Another use of a back reference is to decompose a generic resource indicator (URI) into the component part. Suppose you want to decompose the following URIs into protocols (FTP, HTTP, etc), domain address, and page/path:

http://msdn.microsoft.com:80/scripting/default.htm

The following regular expression can provide this functionality. For JScript, for:

/(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/

For VBScript:

"(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)"

The first additional subexpression is the portion of the protocol used to capture the Web address. The subexpression matches any word that precedes a colon and two forward slashes. The second additional subexpression captures the domain name address of the address. The subexpression matches any sequence of characters that does not include the ' ^ ', '/' or ': ' character. The third additional subexpression captures the site port number, if specified. The subexpression matches 0 or more digits followed by a colon. Finally, the fourth additional subexpression captures the path specified by the Web address and \ or page information. The subexpression matches one and more characters except ' # ' or a space.

When the regular expression is applied to the URI shown above, the child match contains the following:

Regexp.$1 contains "http"

Regexp.$2 contains "msdn.microsoft.com"

Regexp.$3 contains ": 80"

Regexp.$4 contains "/scripting/default.htm"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.