Python3 how to gracefully use regular expressions (detailed six)

Source: Internet
Author: User

Modifying a string

Now that we've covered how to search for characters, let's talk about how a regular expression modifies a string.

The regular expression modifies the string using the following methods:

Method Use
Split () Splits where regular expressions match, and returns a list
Sub () Find all matching substrings and replace them with new content
SUBN () Do the same thing as a sub (), but return the new string and the number of replacements



Split string

The split () method of the regular expression splits the string in place of the match and returns the segmented result as a list. It's actually much like the split () method of the string, but this can use a more extensive delimiter. You guessed right, it also provides a module-level function: Re.split ()

. Split (string[, maxsplit=0])

The string is split by regular expression matching. If you use capturing groups in RE, their contents are returned as a list. You can set the number of splits by passing in a maxsplit parameter. If the value of Maxsplit is 0, it means that at most maxsplit splits are processed, and the remainder is returned as the last element of the list.


In the example below, the delimiter is any non-alphanumeric character:

    1. >>> p = re.compile (R ' \w+ ')
    2. >>> P.split (' This was a test, short and sweet, of Split (). ')
    3. [' This ', ' was ', ' a ', ' test ', ' short ', ' and ', ' Sweet ', ' of ', ' split ', ']
    4. >>> P.split (' This was a test, short and sweet, of Split (). ', 3)
    5. [' This ', ' was ', ' a ', ' test, short and sweet, ' of Split () '.
Copy Code


Sometimes you may not only be interested in the content between delimiters, you may also be interested in the delimiter itself (that is, what the regular expression matches). If a capturing group is used, the value of the delimiter will be returned as well:

    1. >>> p = re.compile (R ' \w+ ')
    2. >>> P2 = re.compile (R ' (\w+) ')
    3. >>> p.split (' This ... is a test. ')
    4. [' This ', ' is ', ' a ', ' test ', ']
    5. >>> p2.split (' This ... is a test. ')
    6. [' This ', ' ... ', ' is ', ', ' a ', ' ', ' test ', '. '
Copy Code


function Re.split () at the module level the other parameters are the same except that the RE is the first parameter:

    1. >>> re.split (' [\w]+ ', ' Words, Words, Words. ')
    2. [' Words ', ' Words ', ' Words ', ']
    3. >>> Re.split (' ([\w]+) ', ' Words, Words, Words. ')
    4. [' Words ', ', ', ' Words ', ', ', ' Words ', '. ', ']
    5. >>> re.split (' [\w]+ ', ' Words, Words, Words. ', 1)
    6. [' Words ', ' Words, Words. ']
Copy Code



Search and replace

Another common task is to find all the matching parts and replace them with different strings. The Sub method can help you achieve this desire! The Sub method has a replacement parameter, which can be a string to be replaced, or a function that handles a string.

. Sub (replacement, string[, count=0])

Returns a string that starts at the far left and replaces all RE-matching places with replacement. If no match is found, the original string is returned.

The optional parameter count specifies the maximum number of replacements that must be a non-negative value. The default value is 0, which means replacing all found matches.


Below is an example of using the sub () method, which replaces all colors with color :

    1. >>> p = re.compile (' (blue|white|red) ')
    2. >>> p.sub (' Colour ', ' blue socks and red shoes ')
    3. ' Colour socks and colour shoes '
    4. >>> p.sub (' Colour ', ' blue socks and red shoes ', count=1)
    5. ' Colour socks and red shoes '
Copy Code


The Subn () method does the same thing as the sub () method, but the difference is that the return value is a tuple with two elements: one is the replaced string, and the other is the number of replacements.

    1. >>> p = re.compile (' (blue|white|red) ')
    2. >>> p.subn (' Colour ', ' blue socks and red shoes ')
    3. (' Colour socks and colour shoes ', 2)
    4. >>> p.subn (' colour ', ' no colours at all ')
    5. (' No colours at all ', 0)
Copy Code


Empty matches are replaced only if they are not next to the previous match:

    1. >>> p = re.compile (' x* ')
    2. >>> p.sub ('-', ' abxd ')
    3. '-a-b-d-'
Copy Code


If the replacement argument is a string, the inside backslash will be processed. such as \ n will be converted to a newline character,\ r converted to carriage return, and so on. An unknown escape such as \j remains intact. A reverse reference, such as \6, is replaced by the content matched by the corresponding capturing group in the RE. This allows you to insert a portion of the original string into the replaced string.

In the following example, the section of the word that is enclosed in { and } is matched, and the section is replaced with subsection:

    1. >>> p = re.compile (' section{([^}]*)} ', Re. VERBOSE)
    2. >>> p.sub (R ' subsection{\1} ', ' Section{first} Section{second} ')
    3. ' Subsection{first} Subsection{second} '
Copy Code


The Little Turtle explains: 1. Do you remember? This opens the RE. VERBOSE, spaces will be ignored. Because here a bunch of symbols, separated by a space to look at will not be messy ... 2.   here; R ' subsection{\1} '   use   \1   references the   in the matching pattern; ([^}]*)   matches the string contents.

can also use Python's extended syntax   (? P<name>, ...)   Specify named groups, the syntax for referencing named groups is   \g<name> . \g<name>   replaces the group matching string with the name name  . In addition, \g<;   is referenced by the group's ordinal number. \g<2>   is actually equivalent to  , \2 , but we prefer to use  , \g<2>, , because this avoids ambiguity. For example, the meaning of \g<2>0   refers to a group with a reference ordinal of   2   and then a character behind it   ' 0 ' , and you write \20   will be considered a group with a reference number of    .

    1. >>> p = re.compile (' section{(? p<name> [^}]*)} ', Re. VERBOSE)
    2. >>> p.sub (R ' subsection{\1} ', ' Section{first} ')
    3. ' Subsection{first} '
    4. >>> p.sub (R ' Subsection{\g<1>} ', ' Section{first} ')
    5. ' Subsection{first} '
    6. >>> p.sub (R ' Subsection{\g<name>} ', ' Section{first} ')
    7. ' Subsection{first} '
Copy Code


Sometimes you may not be satisfied with simple string substitution, you may need to move the "hands and feet" in the process of substitution ... No matter, same can satisfy you! The replacement parameter can also be a function that will be called when the regular expression pattern does not repeat a match at a time. At each invocation, the function receives a parameter for the matching object, so you can use the object to calculate the new string and return it.

In the example below, the replacement function replaces the decimal number with the hexadecimal number:

    1. >>> def hexrepl (match):
    2. ... "Return the hex string for a decimal number"
    3. ... value = Int (Match.group ())
    4. ... return hex (value)
    5. ...
    6. >>> p = re.compile (R ' \d+ ')
    7. >>> p.sub (HEXREPL, ' Call 65490 for printing, 49152 for user code. ')
    8. ' Call 0xffd2 to printing, 0xc000 for user code. '
Copy Code


When using the module-level re.sub () function, the regular expression pattern is used as the first parameter. The pattern can be a string or a compiled object. If you need to specify a regular expression flag, you must use the latter, or use the pattern inline modifier, such as sub ("(? i) B +", "X", "bbbb bbbb") to return ' x x '.

Python3 how to gracefully use regular expressions (detailed six)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.