After the regular expression tutorial, find lookaround for details, takealookaround

Source: Internet
Author: User
Tags pears expression engine

After the regular expression tutorial, find lookaround for details, takealookaround

This example describes how to find lookaround after the regular expression tutorial. We will share this with you for your reference. The details are as follows:

Note: In all examples, regular expression matching results are included between [and] in the source text, and some examples are implemented using Java. If the regular expression is used in java, it will be described in the corresponding area. All java examples have passed the test under JDK1.6.0 _ 13.

I. Problem Introduction

On the HTML page, match the text between a pair of tags. For example, match the page tag, that is, the text between <title> and </title>:

Text:

Regular Expression: <[Tt] [Ii] [Tt] [Ll] [Ee]> .*? </[Tt] [Ii] [Tt] [Ll] [Ee]>

Result:

Analysis: <[Tt] [Ii] [Tt] [Ll] [Ee]> indicates case insensitive. This mode matches the title tag and the text between them, but it is not perfect, because we only want the text between title tags, not the tag itself. To solve this problem, we need to use lookaround ).

2. Forward search

The forward lookup specifies a pattern that must be matched but not returned in the result. Forward lookup is actually a subexpression? =. The text to be matched is followed by =.

Let's look at an example that matches the protocol section of a URL:

Text: http://blog.csdn.net/mhmyqn

Regular Expression:. + (? = :)

Result: [http]: // blog.csdn.net/mhmyqn

Analysis: the protocol section in the URL address is in the previous section. The pattern. + matches any text and the subexpression (? = :) Match:, but the matched: does not appear in the result. We use? = Indicates to the Regular Expression Engine that you only need to find:, but not included in the final returned results. If forward matching is not used here (? = :), But directly use (:). Then the matching result will be http:, which includes:, which is not what we want.

Note: The frontend and backend in the forward and backward searches refer to the relative positions of the mode and the searched text. The left is the front and the right is the back. That is, the forward query is: xxx (? = Xxx (? <= Xxx) xxx, which will be introduced later.

Iii. Backward Search

What is the Backward Search operator? <=. However, not all regular expressions support backward lookup, but JavaScript does not. java supports backward lookup.

For example, to search for the price (starting with $, followed by a number) in the text, the result does not contain the currency symbol:

Text: category1: $136.25, category2: $28, category3: $88.60

Regular Expression :(? <=\$) \ D + (\. \ d + )?

Result: category1: $ [136.25], category2: $ [28], category3: $ [88.60]

Analysis :(? <=\$) $, \ D + (\. \ d + )? The pattern matches integers or decimals. From the results, we can see that the results do not include currency symbols and only match the price. What will happen if you do not use backward lookup? Usage mode $ \ d + (\. \ d + )?, This will include $ in the result. Usage mode \ d + (\. \ d + )?, The number in categery1 (23) is also matched, which is not what we want.

Note: The length of the forward lookup mode is variable. They can contain., *, +, and other metacharacters; the Backward Search mode can only be fixed length and cannot contain., *, +, and other metacharacters.

4. Combine forward search and Backward Search

By combining forward search and Backward Search, you can solve the text issue between the preceding HTML tags:

Text:

Regular Expression :(? <= <[Tt] [Ii] [Tt] [Ll] [Ee]> ).*? (? = </[Tt] [Ii] [Tt] [Ll] [Ee]>)

Result:

Analysis: The results show that the problem is solved perfectly. (? <= <[Tt] [Ii] [Tt] [Ll] [Ee]>) is a backward operation. It matches <title> but does not consume it ,(? = </[Tt] [Ii] [Tt] [Ll] [Ee]>) is a forward operation, which matches </title> but does not consume it. The final matching result contains only the text between tags.

5. Retrieve non-

The forward and backward searches mentioned above are usually used to match text, the purpose is to determine the location of the text of the matching result to be returned (by specifying the text before and after the matching result ). This method is called forward and backward lookup. There is also a negative forward search and negative Backward Search, which is to find text that does not match the given mode.

Search operators:

(? =) Forward lookup
(?!) Negative forward lookup
(? <=) Forward lookup
(? <!) Negative backward Lookup

For example, if a piece of text contains a price (starting with $, followed by a number) and a quantity, we need to find the price and quantity. First, let's look at the price:

Text: I paid $30 for 10 apples, 15 oranges, and 10 pears. I saved $5 onthis order.

Regular Expression :(? <=\$) \ D +

Result: I paid [$30] for 10 apples, 15 oranges, and 10 pears. I saved [$5] on thisorder.

Search quantity:

Text: I paid $30 for 10 apples, 15 oranges, and 10 pears. I saved $5 onthis order.

Regular Expression: \ B (? <! \ $) \ D + \ B

Result: I paid $30 for [10] apples, [15] oranges, and [10] pears. I saved $5 on this order.

Analysis :(? <! \ $) Indicates a negative backward query, which only contains values not starting with $.

Vi. Summary

With forward and backward searches, You can precisely control the content contained in the final matching results. The pre-and post-search operations allow us to use a subexpression to specify the location where the text match operation occurs and receive the results that only match but not consume.

PS: here we will provide two very convenient Regular Expression tools for your reference:

JavaScript Regular Expression online testing tool:
Http://tools.jb51.net/regex/javascript

Regular Expression generation tool:
Http://tools.jb51.net/regex/create_reg

I hope this article will help you learn regular expressions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.