Detailed introduction to the re module of Python Standard Library Learning

Source: Internet
Author: User
This article introduces the re module of the Python standard library.

The re module provides a series of powerful regular expression tools that allow you to quickly check whether a given string matches a given pattern (match function ), or include this mode (search function ). Regular expressions are strings written in a compact (and mysterious) syntax.

1. common methods
Common methods Description
Match (pattern, string, flags = 0) If the start of the string matches the regular expression pattern, the corresponding MatchObject instance is returned. otherwise, None is returned.
Search (pattern, string, flags = 0) Scan string. if a position matches the regular expression pattern, an instance of MatchObject is returned. otherwise, None is returned.
Sub (pattern, repl, string, count = 0, flags = 0) Replace the part that matches the pattern in the string with the repl, and replace the count at most.
Subn (pattern, repl, string, count = 0, flags = 0) Similar to sub, subn returns a string after replacement and a tuple consisting of the number of matching times.
Split (pattern, string, maxsplit = 0, flags = 0) Use the string matched by pattern to split the string
Findall (pattern, string, flags = 0) Returns the string matching pattern in the string in the form of a list.
Compile (pattern, flags = 0) compile (pattern, flags = 0) Compile a regular expression pattern into a regular object so that you can use the match and search methods of the regular object.
Purge () Clear the regular expression cache
Escape (string) Add backslashes to all characters except letters and numbers in string.
2. Special match characters
Syntax Description
. Match any character except line breaks
^ Header match
$ Tail match
* Match the first character 0 or multiple times
+ Match the previous character once or multiple times
? Match the first character 0 times or once
{M, n} Match the first character m to n times
\ Escape any special character
[] Used to indicate a character set combination
| Or, it indicates any match between the left and right
3. module method re. match (pattern, string, flags = 0)

Match from the start of the string. if pattern matches, an instance of the Match object is returned (the Match object is described later). otherwise, None is returned. Flags is a matching mode (described below) used to control the matching mode of regular expressions.

Import rea = 'abcdefg' print re. match (r 'ABC', a) # print re. match (r 'ABC', ). group () print re. match (r 'Cde', a) # match failed >>>< _ sre. SRE_Match object at 0x0000000001D94578 >>>> abc >>> None
Search (pattern, string, flags = 0)

It is used to find the child string that can be matched successfully. if it is found, a Match object instance is returned; otherwise, None is returned.

import rea = 'abcdefg'print re.search(r'bc', a)print re.search(r'bc', a).group()print re.search(r'123', a)>>><_sre.SRE_Match object at 0x0000000001D94578>>>>bc>>>None
Sub (pattern, repl, string, count = 0, flags = 0)

Replace: replace the part that matches the pattern in the string with the repl, and replace the count at most (the remaining match will not be processed), and then return the replaced string.

Import rea = 'a1b2c3' print re. sub (r' \ d + ', '0', a) # replace the number with '0' print re. sub (r '\ s +', '0', a) # Replace the blank character with '0' >>> a0b0c0 >>> a1b2c3
Subn (pattern, repl, string, count = 0, flags = 0)

Like the sub () function, it only returns a tuples that contain the new string and the number of times it matches.

Import rea = 'a1b2c3' print re. subn (r' \ d + ', '0', a) # replace the number with '0' >>> ('a0b0c0', 3)
Split (pattern, string, maxsplit = 0, flags = 0)

In the regular expression, split () uses a substring that matches pattern to split the string. If parentheses are used in pattern, the string that is matched by pattern will also be part of the return value list, maxsplit is the string to be split at most.

import rea = 'a1b1c'print re.split(r'\d', a)print re.split(r'(\d)', a)>>>['a', 'b', 'c']>>>['a', '1', 'b', '1', 'c']
Findall (pattern, string, flags = 0)

Returns a list of non-overlapping substrings matching the pattern in the string.

import rea = 'a1b2c3d4'print re.findall('\d', a)>>>['1', '2', '3', '4']
4. Match object

Re. match (), re. if search () is matched successfully, a Match object will be returned. it contains a lot of information about the matching. you can use the attributes or methods provided by Match to obtain the information. For example:

>>> Import re >>> str = 'he has 2 books and 1 pen '>>> ob = re. search ('(\ d +)', str) >>> print ob. string # The text he has 2 books and 1 pen used for matching >>> print ob. re # The Pattern object re used for matching. compile (r' (\ d +) ')> print ob. group () # obtain the string intercepted by one or more groups. 2 >>> print ob. groups () # returns the string intercepted by all groups in the form of tuples ('2 ',)
5. Pattern object

The Pattern object is returned by re. compile (). it carries many methods with the same name as the re module, and the functions of the methods are similar. For example:

>>>import re>>>pa = re.compile('(d\+)')>>>print pa.split('he has 2 books and 1 pen')['he has ', '2', ' books and ', '1', ' pen']>>>print pa.findall('he has 2 books and 1 pen')['2', '1']>>>print pa.sub('much', 'he has 2 books and 1 pen')he has much books and much pen
6. matching mode

The value of the matching mode can take effect simultaneously using the bitwise OR operator '|'. for example, re. I | re. M. Below are some common flags.

  • Re. I (re. IGNORECASE): Case insensitive

>>>pa = re.compile('abc', re.I)>>>pa.findall('AbCdEfG')>>>['AbC']
  • Re. L (re. LOCALE): localized character set

This function is used to support multi-language character sets. for example\wIn English, it represents[a-zA-Z0-9]English characters and numbers. If you use it in a French environment, some French strings cannot match. Add the L option to match. However, this seems useless for Chinese environments, and it still cannot match Chinese characters.

  • Re. M (re. MULTILINE): multi-line mode, changing the behavior of '^' and '$'

>>>pa = re.compile('^\d+')>>>pa.findall('123 456\n789 012\n345 678')>>>['123']>>>pa_m = re.compile('^\d+', re.M)>>>pa_m.findall('123 456\n789 012\n345 678')>>>['123', '789', '345']
  • Re. S (re. DOTALL): Any point matching mode, changing the behavior '.'

  .Will match all characters. By default.Match linefeed\nWith this option, the DOT can match any character including the line break.

  • Re. U (re. UNICODE): parses characters based on the Unicode character set

  • Re. X (re. VERBOSE): VERBOSE mode

# In this mode, the regular expression can be multiple rows, Ignore blank characters, and add comments. The following two regular expressions are equivalent to a = re. compile (r "\ d + # the integral part \. # the decimal point \ d * # some fractional digits ", re. x) B = re. compile (r "\ d + \. \ d * ") # In this mode, if you want to match a space, you must use the '/' format ('/' followed by a space)

The above is a detailed introduction to the re module of the Python standard library. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.