Python -- Regular Expression (4)

Source: Internet
Author: User
Tags alphanumeric characters

Python -- Regular Expression (4)

5. Modify the string

So far, we only know how to perform a search on a static string. You can use the following pattern to modify a string:

Method/attribute Function
Split () Returns a list after the regular expression matches.
Sub () Find all matched substrings and replace them with the new substrings.
Subn () Does the same thing as the sub () method, but returns a new string and the number of replicas.
Certificate ---------------------------------------------------------------------------------------------------------------------------------------------------

 

. Split the string
The split () method of the pattern object is divided where the regular expression matches, and the split result is returned as a list. It is similar to the split () method of a Python string. The split () method of a Python string supports splitting by space or fixed string, but it provides a wider range of separators. As you expected, it also provides a module-level function re. split ().

Split (string [, maxsplit = 0])
Splits strings according to regular expression matching. If a capture group is used in a regular expression, their content will also be returned as part of a list. If the value of maxsplit is not 0, a maximum of maxsplit splits are processed.

You can pass in the maxsplit parameter to determine the number of shards. When maxsplit is not 0, a maximum of maxsplit shards are processed, and the remaining strings are returned as the last one in the list. In the following example, the Delimiter is any sequence of non-alphanumeric characters.

>>> p = re.compile(r'\W+')>>> p.split('This is a test,short and sweet,of split().')['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']>>> p.split('This is a test,short and sweet,of split().',3)['This', 'is', 'a', 'test,short and sweet,of split().']
Sometimes you are not only interested in the content between separators, but also interested in the separators themselves. If you use capture groups in a regular expression, their contents are returned as part of the list,

 

Compare the following examples:

>>> p = re.compile(r'\W+')>>> p2 = re.compile(r'(\W+)')>>> p.split('This... is a test.')['This', 'is', 'a', 'test', '']>>> p2.split('This... is a test.')['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
In addition to using a regular expression as the first parameter, the re. split () function at the module level is the same.
>>> re.split(r'[\W]+','Words,words,words.')['Words', 'words', 'words', '']>>> re.split(r'([\W]+)','Words,words,words.')['Words', ',', 'words', ',', 'words', '.', '']>>> re.split(r'[\W]+','Words,words,words.',1)['Words', 'words,words.']
Certificate ---------------------------------------------------------------------------------------------------------------------------------------------------
. Search and replace
Another common task is to find all matched substrings and replace them with new substrings. The sub method has a replacement parameter that can accept a string or a function that processes a string.

Sub (replacement, string [, count = 0])
Returns a string starting from the left and replacing all matching places with replacement. If no matching content is found, the original string is returned.
The optional parameter count is the maximum number of times the mode is replaced. Count must be a non-negative number. The default value 0 indicates that all matching locations are replaced.

The following is a simple example of using the sub () method. It replaces the names of all colors with the word color.
>>> p = re.compile('(blue|white|red)')>>> p.sub('colour','blue socks and red shoes')'colour socks and colour shoes'>>> p.sub('colour','blue socks and red shoes',1)'colour socks and red shoes'
The subn () method is consistent with the sub () function, but the subn () method returns a tuple containing two elements: new string and number of times of replacement:
>>> p = re.compile('blue|white|red')>>> p.subn('colour','blue socks and red shoes')('colour socks and colour shoes', 2)>>> p.subn('colour','no colours at all')('no colours at all', 0)
Null matches are replaced only when they are not matching the previous one:
>>> p = re.compile('x*')>>> p.sub('-','abxd')'-a-b-d-'
If the replacement parameter is a string, the backslash in it will be processed. For example, \ n will be converted into a line break, \ r will be converted into a carriage return, and so on. Unknown escaping such as \ j remains unchanged. Reverse reference, for example, \ 6, is replaced by the content matched by the corresponding capture group in RE. This allows you to insert a part of the original string into the replaced string.

The following example matches the word section followed by braces {} and replaces the section with the subsection:
>>> p = re.compile('section{([^}]*)}',re.VERBOSE)>>> p.sub(r'subsection{\1}','section{First} section{Second}')'subsection{First} subsection{Second}'
You can also use the Python extension syntax (? P ) Specify the name group, \ g It will replace the content matched by the group named "name", and \ g <number> can also be referenced by the group sequence number. \ G <2> is the same as \ 2, but it is clearer. For example, in such an expression: \ g <2> 0, it indicates that the content of the group whose group number is 2 is referenced, and the number 0 is matched later. If it is changed to the first method, it will become: \ 20, in this way, the matching engine will consider that the content of the 20th groups is referenced. The following three examples are the same replacement, but three different replacement strings are used:
>>> p = re.compile('section{ (?P
   
    [^}]*) }',re.VERBOSE)>>> p.sub(r'subsection{\1}','section{First}')'subsection{First}'>>> p.sub(r'subsection{\g<1>}','section{First}')'subsection{First}'>>> p.sub(r'subsection{\g
    
     }','section{First}')'subsection{First}'
    
   
The replacement parameter can also be a function, which allows you to implement more functions. If the replacement parameter is a function, the function is called every time the regular expression mode does not repeat. During each call, the function receives a parameter that matches the object, so you can use this object to calculate a new string and return it.

In the following example, the replacement method converts a decimal number into a hexadecimal number:
>>> def hexrepl(match):'''Return the hex string for a decimal number'''value = int(match.group())return hex(value)>>> p = re.compile(r'\d+')>>> p.sub(hexrepl,'Call 65490 for printing, 49152 for user code.')'Call 0xffd2 for printing, 0xc000 for user code.'
When the module-level function re. sub () is used, the regular expression mode is also passed as the first parameter. This mode can be a string or a compiled object. If you need to specify the regular expression compilation flag, you must use the latter; or use the mode embedded modifier, for example, sub ("(? I) B + "," x "," bbbb BBBB ") return 'x x '.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.