Python Regular Expressions (intermediate), python Regular Expressions

Source: Internet
Author: User
Tags rekey

Python Regular Expressions (intermediate), python Regular Expressions

Link: http://www.bkjia.com/article/99372.htm

In the previous article, we said that in this article, we will introduce subexpressions, search forward and backward, and trace back references. At the beginning of this article, in addition to backtracking references which are irreplaceable in some cases, you should write regular expressions in most cases.

1. subexpression

The concept of a subexpression is particularly understandable. In fact, it refers to the combination of several characters as a big "character ". Not easy to understand? For example, we need to match characters in the form of IP addresses (for the time being, we do not consider the rationality of the value range. Please leave this as a question after learning ). How do we write an expression for an address like 192.168.1.1?

Answer 1 \ d + .? \ D + .? \ D + .? \ D +

Not good. One is too cumbersome, and the other is unable to control the number of bits.

Answer 2 \ d + {1, 3 }.? \ D + {1, 3 }.? \ D + {1, 3 }.? \ D + {1, 3}

Generally, it is complicated, but at least it can control the number of digits within a reasonable range.

Answer 3 (\ d + {1, 3} \.) {3} \ d + {1, 3 }\.

The sub-expression is used to add a decimal point of 123. This number is regarded as a whole character, which specifies the number of times of repeated matching, Which is concise and effective. So as long as you enclose a combination of several characters in parentheses, You can regard the content in a parentheses as a character, you can add all the metacharacters we mentioned earlier to control matching.

2. search forward and backward

Now, we are finally looking forward and backward. Why did we finally come here? Do you still remember the first example in the preliminary article?

If you are writing a crawler, you get the HTML source code of a webpage. There is a section of html

You want to extract the hello world

Import rekey = r "

This regular expression

p1 = r"(?<=

See(?<=And(?=? First? <= Indicates that there must be

To put it simply, the character you want to match is XX, but the format must be a string like AXXB. Then you can write a regular expression like this.

p = r"(?<=A)XX(?=B)"

The matched string is XX. In addition, the forward search and Backward Search do not need to appear at the same time. If you want to, you can only write to meet one condition.

Therefore, you do not need to remember which is forward lookup and backward lookup. Just remember? <= What follows the prefix requirement ,? = Suffix requirements are followed.

In essence, the forward and backward searches are actually matching the entire string, that is, AXXB, but only one XX is returned. That is to say, if you want to, you can avoid the forward and backward search method, directly match the string with the prefix and then slice the string.

3. Backtracking reference

Unlike the previous forward and backward lookup, this one may not be bypassed. In some cases, you must also use backtracing references. If you want to use regular expressions in actual applications, you should understand and master backtracing references.

Let's start with the example below.

You originally wanted to match the content between

p = r"

In this way, you can match all the titles on the HTML page. That is, the content from

For example

Have you found the following

Import rekey = r "

You can see the effect at the end.

Have you seen \ 1? The original location should be [1-6], but we wrote \ 1. As we said before, the escape character \ dry is to convert special characters into general characters, converts a general character to a special character. What is the transfer of ordinary number 1? Here, 1 indicates the first subexpression, that is, it is dynamic and changes with the matching of the first subexpression. For example, if the first subexpression is [1-6] and 1 is found in the actual string, then \ 1 is 1, if the first sub-expression finds 2 in the actual string, then \ 1 is 2.

Similarly, \ 2, \ 3,... represents the second third subexpression.

Therefore, backtracking reference is a "dynamic" Regular Expression in a regular expression, allowing you to match based on actual changes.

Here is the intermediate article. In fact, there are still a lot of details about the regular expression that have not been written, and there are also a lot of Multi-character characters that I have not explained, but I have mastered the outline, after understanding the principle, the rest is similar to the look-up table structure.

It is recommended that you see a friend here to see "Regular Expressions are required". The first article and several examples in this article are also based on this.

The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message and share it with us!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.