[Python 3 Series] Regular expressions

Last Update:2017-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular expressions, referred to as regex, are descriptive methods of text patterns. For example, \d is a regular expression that represents a numeric character, that is, any number from 0 to 9.

Use steps

The functions of all regular expressions in Python are in the RE module.

The steps for using regular expressions in ▎python are as follows:

① Import the regular expression module with import re;

② creates a Regex object with the Re.compile () function.

③ passes the search () method of the Regex object to the string that you want to find. It returns a Match object.

④ calls the group () method of the match object, returning the string that actually matches the text.

Character classification

character type character meaning

\d 0 to 9 of any number

\d any character except a number from 0 to 9

\w any letter, number, or underscore (word)

\w any character except letters, numbers, and underscores

\s Space, tab, or line break (blank)

\s any character except spaces, tabs, and newline characters

Regular expression symbols

? Match 0 or one of the preceding groupings

* Match 0 or more previous groupings

+ Match one or more of the preceding groupings

| Matches one of several expressions

() use parentheses to create a "group"

{n} matches the previous grouping of n times

{n,} matches n or more preceding groupings

{, m} matches 0 times to M-Times before grouping

{N,m} matches groups that are at least n times, up to M times before

{n,m}? or *? or +? Non-greedy matching of the preceding groupings

^spam string must start with spam

spam$ string must end with spam

. Match all characters, except line breaks

\d, \w, and \s match numbers, words, and spaces

\d, \w, and \s match all characters in words, words, and spaces

[ABC] matches any character in square brackets

[^ABC] matches any character that is not in square brackets

Regular Expression methods

1, compile ()

Passes a string value to Re.compile (), representing the regular expression, which returns a Regex pattern object.

If you want to ignore whitespace characters and comments in a regular expression string, you can pass in the variable re. VERBOSE.

If it is not case-sensitive, you can pass in the re. IgnoreCase or RE.I.

If you want a period character to match a line break, you can pass in the re. Dotall.

The Re.compile () function takes only one value as its second argument, and can be combined with a pipe character to circumvent this limitation.

>>> Import re>>> phonenum=re.compile (R ' \d\d\d-\d\d\d\d\d\d\d\d ')

2. Group ()

The match object has a group () method that returns the text that is actually matched in the found string.

Adding parentheses creates a "group" in the regular expression. The first pair of parentheses in the regular expression string is group 1th. The second pair of parentheses is group 2nd. Passing an integer 1 or 2 to the group () Matching object method allows you to get different parts of the matched text. Passing a 0 or no parameter to the group () method returns the entire matched text. If you want to get all the groupings at once, use the groups () method.

>>> Import re>>> Phonenum=re.compile (R ' (\d\d\d)-(\d\d\d\d\d\d\d\d) ') >>> mo= Phonenum.search (' My number is 021-68000000 ') >>> print (Mo.group (0)) 021-68000000>>> print (Mo.group ( 1)) 021>>> print (Mo.group (2)) 68000000>>> print (Mo.groups ()) (' 021 ', ' 68000000 ')

3. Search ()

The search () method of the Regex object looks for the passed-in string, looking for all occurrences of the regular expression. If the regular expression pattern is not found in the string, the search () method returns none. If the pattern is found, the search () method returns a Match object.

>>> Import re>>> phonenum=re.compile (R ' \d\d\d-\d\d\d\d\d\d\d\d ') >>> Mo=phonenum.search ( ' My number is 021-68000000 ') >>> print (Mo.group ()) 021-68000000

4, FindAll ()

Search () returns a Match object containing the "first" matching text in the found string, and the FindAll () method returns a set of strings containing all the matches in the found string.

▎ as the return result of the FindAll () method, there are two points to note:

① If the call is on a regular expression that does not have a grouping, such as \d\d\d-\d\d\d-\d\d\d\d, a list of matching strings is returned, such as [' 123-456-7890 ', ' 000-000-0000 '].

② if called on a regular expression that has a grouping, for example (\d\d\d)-(\d\d\d)-(\d\d\d\d), returns a list of the tuples of a string, such as [(' 123 ', ' 456 ', ' 7890 '), (' 000 ', ' 000 ', ' 0000 ')]

>>> Import re>>> Phonenum=re.compile (R ' (\d\d\d) ') >>> phonenum.search (' 68000000 ') <_ Sre. Sre_match object; Span= (0, 3), match= ' 680 ' >>>> phonenum.findall (' 68000000 ') [' 680 ', ' 000 ']

5, Sub ()

The sub () method requires the passing of two parameters. The first argument is a string that replaces the found match. The second argument is a string, which is a regular expression. The sub () method returns the string after the replacement is complete.

>>> Import re>>> phonenum=re.compile (R ' 021-6800 ') >>> phonenum.sub (' 8800 ', ' My number is 021-68000000. ') ' My number is 88000000. '

Greed and non-greed

Python's regular expressions are "greedy" by default, which means that they match the longest string possible in the case of two semantics. The "non-greedy" version of the curly braces matches the shortest possible string, which is followed by a question mark after the closing curly brace.

A question mark may have two meanings in a regular expression: declaring a non-greedy match or representing an optional grouping. These two meanings are completely irrelevant.

>>> Import re>>> Phonenum01=re.compile (R ' (\d\d\d) {1,3} ') >>> Phonenum02=re.compile (R ' (\d \d\d) {1,3}? ') >>> mo01=phonenum01.search (' 68000000 ') >>> mo02=phonenum02.search (' 68000000 ') >>> Mo01.group () ' 680000 ' >>> mo02.group () ' 680 '

This article is from the "garbled Age" blog, please be sure to keep this source http://juispan.blog.51cto.com/943137/1949567

[Python 3 Series] Regular expression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Python 3 Series] Regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Python 3 Series] Regular expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support