Microsoft Regular Expression tutorial (4): Qualifier and Locator

Source: Internet
Author: User
Qualifier

Sometimes I don't know how many characters to match. To adapt to this uncertainty, regular expressions support the concept of delimiters. These qualifiers can specify how many times a given component of a regular expression must appear to match.

The following table describes the delimiters and their meanings:

Character Description
* Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} NIs a non-negative integer. MatchedNTimes. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} NIs a non-negative integer. At least matchNTimes. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
{N,M} MAndNAll are non-negative integers, whereN<=M. Least matchNTimes and most matchingMTimes. Liu, "O {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.

For a very large input document, the number of chapters is easily more than nine chapters, so there is a way to deal with two or three-digit number of chapters. The qualifier provides this function. The following JScript regular expression can match the title of a section with any digits:

/Chapter [1-9][0-9]*/

The following VBScript Regular Expression performs the same match:

"Chapter [1-9][0-9]*"

Note that the qualifier appears after the range expression. Therefore, it applies to the entire range expression. In this example, only numbers from 0 to 9 are specified.

The '+' qualifier is not used here, Because a number is not required for the second or subsequent positions. '? 'Character, because this limits the number of chapters to only two digits. At least one number must be matched after 'Chapter 'and space characters.

If the number of chapters is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two digits.

/Chapter [0-9]{1,2}/

You can use the following regular expressions for VBScript:

"Chapter [0-9]{1,2}"

The disadvantage of the above expression is that if there is a chapter number greater than 99, it will only match the first two digits. Another drawback is that some people can create a chapter 0 and can still match. A better JScript expression for matching two digits is as follows:

/Chapter [1-9][0-9]?/

Or

/Chapter [1-9][0-9]{0,1}/

For VBScript, the following expressions are equivalent to the above:

"Chapter [1-9][0-9]?"

Or

"Chapter [1-9][0-9]{0,1}"

'*','+'And'?'All the delimiters are calledGreedyThat is to say, they try to match as many words as possible. Sometimes this is not what you want to happen. Sometimes the minimum matching is expected.

For example, you may need to search for an HTML document to find a Chapter title contained in the H1 tag. The text in the document may take the following form:

<H1>Chapter 1 – Introduction to Regular Expressions</H1>

The following expression matches all content from the starting less than sign (<) to the end of the H1 mark greater than sign.

/<.*>/

The Regular Expression of VBScript is:

"<.*>"

If you want to match the start H1 mark, the following non-Greedy expression will only match

/<.*?>/

Or

"<.*?>"

In '*', '+', or '? 'Place after the qualifier '? ', The expression is converted from greedy match to non-greedy or minimum match.

Operator

Until now, all the examples we see are looking for chapter titles that appear anywhere. Any character string 'Chapter 'followed by a space and a number may be a real Chapter title or a cross reference to other chapters. Because the title of a chapter always appears at the beginning of a row, you need to design a method to only search for the title rather than cross reference.

The locator provides this function. A regular expression can be fixed at the beginning or end of a row. You can also create a regular expression that appears only within or at the beginning or end of a word. The following table lists the regular expressions and their meanings:

Character Description
^ Matches the start position of the input string. IfRegexpObjectMultilineAttribute, ^ matches the position after '/N' or'/R.
$ Matches the end position of the input string. IfRegexpObjectMultilineAttribute, $ also matches the position before '/N' or'/R.
/B Match A Word boundary, that is, the position between a word and a space.
/B Match non-word boundary.

The delimiters cannot be used. Because there are no consecutive positions before or after a linefeed or word boundary, expressions such as '^ *' are not allowed.

To match the beginning of a line of text, use the '^' character at the beginning of the regular expression. Do not mix the '^' syntax with the syntax in the brackets expression. Their syntax is fundamentally different.

To match the text at the end of a line of text, use the '$' character at the end of the regular expression.

The following JScript Regular Expression matches the title of a chapter with a maximum of two numbers at the beginning of a row:

/^Chapter [1-9][0-9]{0,1}/

The regular expressions with the same functions in VBScript are as follows:

"^Chapter [1-9][0-9]{0,1}"

The title of a real chapter appears not only at the beginning of a row, but also at the end of a row. The following expression ensures that the specified match matches only the chapter but does not match the cross reference. It is implemented by creating a regular expression that matches the start and end positions of only one line of text.

/^Chapter [1-9][0-9]{0,1}$/

For VBScript:

"^Chapter [1-9][0-9]{0,1}$"

Matching word boundary is slightly different, but it adds a very important function to the regular expression. A word boundary is the position between a word and a space. Non-word boundary is any other position. The following JScript expression matches the first three characters of the word 'Chapter 'because they appear after the word boundary:

//bCha/

For VBScript:

"/bCha"

Here, the position of the '/B' operator is critical. If it is located at the beginning of the string to be matched, it searches for the match at the beginning of the word; if it is located at the end of the modified string, it looks for the match at the end of the word. For example, the following expression matches the 'ter 'in the word 'Chapter' because it appears before the word boundary:

/ter/b/

And

"ter/b"

The following expression matches 'apt 'because it is located in the middle of 'Chapter', but does not match 'apt 'in 'aptitude ':

//Bapt/

And

"/Bapt"

This is because 'apt 'appears in the non-word boundary position in the word 'Chapter', while in the word 'aptitude', it is at the word boundary position. The position of the non-word boundary operator is not important because the match is irrelevant to the start or end of a word.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.