[Python] Getting started with Python regular expressions-Basic metacharacters matching operations

Source: Internet
Author: User

Regular Expressions are very useful for text processing. They used to find it difficult. It is very easy to get started by studying it carefully.
The first thing to do is metacharacters:
[]:-It is often used to create a character set. Only one character can be matched. [ABC] matches A, B, or C.
-Other metacharacters do not work in [].

-'^' Indicates the complement set, and '-' indicates the range.

import rer='a[abc]c're.findall(r,'abc aac adc ')['abc', 'aac']r=r'a[bcd$]'re.findall(r,'ad')['ad']re.findall(r,'ab')['ab']r=r'[^abc]dd're.findall(r,'add cdd fdd')['fdd']r=r'[a-z]bc're.findall(r,'bbc ...abc')['bbc', 'abc']

^: Match the beginning of a row

r=r'^abc're.findall(r,'abcd ...')['abc']re.findall(r,'bcd ...')[] 

$: Matching the end of a row

r='abc$'re.findall(r,'bcd ...ab')[]re.findall(r,'bcd ...abc')['abc']

\: Escape is used if the characters to be matched contain metacharacters.
# Matching string ^ #

R = R' \ ^ #'re. findall (R, 'ddcd ^ # nihao') ['^ #'] r = R' ^ # '# Re cannot be matched without escaping. findall (R, 'ddcd ^ # nihao') []

Other common \ combinations:
\ D matches any decimal number [0-9]
\ D match any non-numeric character [^ 0-9]
\ S matches any blank characters: [\ t \ n \ r \ f \ v]
\ S matches any non-blank characters [^ \ t] \ n \ r \ f \ v]
\ W match any letter/digit character [a-zA-Z0-9]
\ W match any non-alphanumeric character [^ a-zA-Z0-9]
. Match any character out of line breaks
 

# For example, matching a phone number such as 0561-4564620 r = R' \ D-\ D \ d 're. findall (R, '2017-100 ') ['1970-4567 '] # For example, matching a mailbox consisting of only 3 letters and numbers R = R' \ W \ w@126.com' re. findall (R, 'ly1 @ 126.com ') ['ly1 @ 126.com'] re. findall (R, 'ly * @ 126.com ') # special characters cannot match [] re. findall (R, '000000') [] # For example, the password must contain 6 characters, r = R' ^ ...... $'re. findall (R, '20140901') [] re. findall (R, '11111e ') ['11111e'] re. findall (R, '11111e5') []

The above example imposes few conditions, because we can use only one metacharacter to limit the matching of one character at a time. If it matches any long character, or are there any or some of the undefined characters?
For example, if the phone number 010-3333333 0103333333 05614444444 0561-3333333 is preceded by a 3-digit or 4-digit center '-', can there be no last 7 digits? This requires multi-repeat matching metacharacters.

* The preceding character can be matched 0 times or multiple times.
A [BCD] * B -- 'abb' abbbccddb' can be matched as long as the number of times is not limited.

R = R' \ D * B '# match a string of d after a number, regardless of the number or even 0 above. findall (R, '333b 4444 D4D ') ['333b'] re. findall (R, '333b 4444 D4D B ') ['333b', 'B']

+-Indicates matching once or multiple times, at least once

# Let's see the difference between * and r = R' \ D + B 're. findall (R, '333b 4444 D4D B 3B ') # Here B cannot match, because there must be at least one number ['333b', '3b ']

? Matches 0 times or once. You can have either or none of them.

R = R' \ D -? \ D' re. findall (R, '3-4 34 3--4 D-4 ') ['3-4', '34'] # There are 2'-'and it won't match

|: Or matches, that is, often used or

{N} {m, n}: exact matching times or range

# For example, r = R' ^ \ D {3, 4 }-? \ D {7}'re. findall (R, '20140901') ['20160901'] re. findall (R, '010-4444444 ') ['010-4444444'] re. findall (R, '2017-100') ['2017-100'] re. findall (R, '20140901') ['20160901'] re. findall (R, '000000') # Why is there 8 digits? Seven digits have not been set to end ['000000'] r = R' ^ \ D {3, 4 }-? \ D {7} $'re. findall (R, '20140901') []

Can we use {} to replace the above * +? Of course.
* {0,} + {1 ,}? {0, 1}
However, it is not recommended to use this method. It is better to use symbols.

Here are some methods for RE module:
Re. Compile () When we often use regular expressions, We can pre-compile them and use them, which improves the efficiency.

R = R' ^ \ D {3, 4 }-? \ D {7} $ 'res = Re. compile (r) <_ SRE. sre_pattern object at 0x0299b728> # generate a regular object res. findall ('20140901') ['20160901']

Here is a brief introduction. Some metacharacters and the usage of many re modules require more exercises. Greedy and non-greedy matches, search and match are different, and there are some split operations, take the string operation, the boundary operation of each word, and so on.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.