Python full stack development 6. Regular Expressions and python Regular Expressions

Source: Internet
Author: User

Python full stack development 6. Regular Expressions and python Regular Expressions

Regular Expressions use a single string to describe and match a series of strings that comply with certain syntax rules. They are very powerful in text processing and are often used as crawlers to crawl specific content, python itself does not support regular expressions, but Python can also use regular expressions by importing the re module. The following describes the usage of python regular expressions.

I. matching rules

Let's take a look at the figure below, which lists the matching rules of python regular expressions. Except for the group part in the figure, you need to be familiar with others.

Ii. findall

Findall (): returns the matched results in the form of a list. If no match is found, an empty list is returned.

Import rel = re. findall (r '\ d', '4g6gggg9, 9') # \ d indicates a number. Put the matched elements in a list, print (l) #['4 ', '6', '9', '9'] print (re. findall (R' \ W', 'ds .. _ 4') # ['D', 's', '_', '4'], matching the letter, digit, and underline print (re. findall (r '^ sk', 'skggj, fd, 7') # ['sk'] print (re. findall (r '^ sk', 'kggj, fd, 7') # [] print (re. findall (r 'K {3, 5} ', 'ffkkkkk') # Take the first character 'K' 3 to 5 times, ['kkkkk '] print (re. findall (r 'a {2} ', 'aasdaaaaaf') # match the previous character a twice, ['A', 'A', 'a'] print (Re. findall (r 'a * x', 'aaaaaax ') # ['aaaaaax'] matches the previous character 0 or multiple times, greedy match print (re. findall (r '\ d *', 'www33333 ') # ['', '123',''] print (re. findall (r 'a + C', 'aaaacccc') # ['aaaac '] matches the previous character once or multiple times, greedy match print (re. findall (r'a? C ', 'aaaacccc') # ['ac', 'C'] matches the previous character 0 times or once print (re. findall (r'a [.] d', 'acdggg abd '))#. meaning is lost in [], so the result is [] print (re. findall (R' [a-z] ', 'h43. hb-gg ') # ['h', 'h',' B ', 'G', 'G'] print (re. findall (R' [^ a-z] ', 'h43. hb-gg ') # reverse, ['4', '3 ','. ', '','-'] print (re. findall (r 'ax $ ', 'dsadax') # End with 'ax '['ax'] print (re. findall (r 'a (\ d +) B ', 'a23666b') # ['123'] print (re. findall (r'a (\ d + ?) B ', 'a23666b') # If there are conditions before and after ['20160901'], print (re. findall (r 'a (\ d +) ', 'a23b') # ['23'] print (re. findall (r'a (\ d + ?) ', 'A23b') # [2] add one? Change to non-Greedy Mode
Iii. match and search

Match starts from the beginning of the string to be matched. If the string does not match the regular expression, the match fails. The function returns None. If the match is successful, the group is used to retrieve the matching result, search is similar to mach. search returns a string that matches one of the entire string's direct paths. See the code below.

Import reprint (re. match (r 'a \ d', 'a333333a4 '). group () # a3, match to the first returned print (re. match (r 'a \ d', 'ta333333a4 ') # a Noneprint (re. search (r 'a \ d', 'ta333333a4 '). group () # a3 the entire string does not need to match print (re. search (r 'a (\ d +) ', 'a23b '). group () # a23 note that group () returns the entire matched string. If group (1) is returned, only print (re. search (r'a (\ d + ?) ', 'A2366666666666b '). group () # a2 non-Greedy mode print (re. search (r 'a (\ d +) B ', 'a23666b '). group (1) #23666 group (1) returns the first group
Iv. split, sub, and subn

Split can split the matched substrings and return to the list. sub can replace the matched fields with another string to return the replaced string. subn also returns the number of replicas, the following code shows their usage.

Import reprint (re. split (R' \ d', 'sd. 4, r5 ') # split by numbers. Note that there is a space ['sd. ',', R', ''] print (re. sub (R' \ d', 'OK', '3, 4sfds. 6hhh ') # OK, OKsfds. OKhhhprint (re. sub (R' \ d', 'OK', '3, 4sfds. 6hhh ', 2) #2 indicates replacing OK with OKsfds.6hhhprint (re. subn (R' \ d', 'OK', '3, 4sfds. 6hhh ') # (' OK, OKsfds. OKhhh ', 3) the number of replicas is returned.
V. Native string, compilation, grouping 1. Native string

Careful people will find that every time I write a matching rule, I add an r in front of it. Why should I write it like this? I will explain from the code below,

Import re # "\ B" represents the return key in ASCII characters, and \ B Represents "match a word boundary" in regular expressions print (re. findall ("\ bblow", "jason blow cat") # Here \ B Represents the return key, so no matching print (re. findall ("\ bblow", "jason blow cat") # ['blow'] print (re. findall (r "\ bblow", "jason blow cat") # ['blow'] does not need to be escaped after native strings are used.

You may notice that we use "\ d" in the regular expression, and there is no use of the original string, and there is no problem. That's because there are no special characters in ASCII, so the regular expression compiler can know that you are referring to a decimal number. However, we write code in a rigorous and simple way, preferably in the format of native strings.

2. Compile

If a matching rule is used multiple times, we can compile it first, and then we don't need to write the matching rule every time. The following describes how to use it.

Import rec = re. compile (R' \ D') # to be used in the future, you only need to call it directly to print (c. findall ('as3 .. 56, ') # ['3', '5', '6']
3. Group

In addition to a simple match, regular expressions also provide powerful functions to extract substrings. Use()It indicates the group to be extracted. There can be multiple groups and many grouping methods are used. Here is a brief introduction.

Import re # findall returns the list. You cannot use the group to obtain the group print (re. findall (R' (\ d +)-([a-z]) ', '2014-dfsdfs777-hhh') # [('20140901', 'D '), ('20140901', 'H')] print (re. search (R' (\ d +)-([a-z]) ', '2017-dfsdfs777-hhh '). group (0) # 34324-d returns the overall print (re. search (R' (\ d +)-([a-z]) ', '2017-dfsdfs777-hhh '). group (1) #34324 obtain the first group print (re. search (R' (\ d +)-([a-z]) ', '2017-dfsdfs777-hhh '). group (2) # d get the second group print (re. search (R' (\ d +)-([a-z]) ', '2017- Dfsdfs777-hhh '). group (3) # IndexError: no such groupprint (re. search (r "(jason) kk \ 1", "xjasonkkjason "). group () # \ 1 indicates the jasonkjasonprint (re. search (R' (\ d) gg \ 1', '2j333gg3jjj8 '). group () #3gg3 \ 1 indicates using the first group \ d # Why is the following return None empty? 3gg7 is not matched, because \ 1 not only indicates the first group, but also matches the content of the first group. The first group matches 3, the second group matches to 7, so null print (re. search (R' (\ d) gg \ 1', '2j333gg7jj8') print (re. search (R '(? P <first> \ d) abc (? P = first) ', '1abc1') #1abc1 declares an ancestor and uses the ancestor name to reference a group.
5. Comprehensive Exercises

Check an IP address, such as 192.168.1.1. Let's see how the code is implemented.

C = re. compile (R' (1 \ d | 2 [0-4] \ d | 25 [0-5] | [1-9] \ d | \ d) \.) {3} (1 \ d | 2 [0-4] \ d | 25 [0-5] | [1-9] \ d | \ d )') print (c. search ('2017. 255.256.25asdsa10.11.244.10 '). group () #10.11.244.10 245.255.256.25 does not meet the requirements, so it does not match

Here we will explain the above matching rules, first read (1 \ d | 2 [0-4] \ d | 25 [0-5] | [1-9] \ d | \ d )\.), where 1 \ d indicates matching 100-199 numbers | represents or means, 2 [0-4] \ d indicates matching 100-249,25 [0-5] indicates matching 250-255, [1-9] \ d | \ d) matches 10-99 and 0-9 ,\. match a point. {3} indicates matching the group three times. To match an ip address, it is important to first understand the form of each field of the ip address, and then write the matching rule. If you are interested, go to the implementation of the calculator in my Python practice directory and use regular expressions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.