Python Regular Expressions and python Regular Expressions

Source: Internet
Author: User
Tags expression engine

Python Regular Expressions and python Regular Expressions

Refer:

BlogPython Regular Expression Guide

BlogRe module of Python


I would like to thank the bloggers for their selfless dedication. This article mainly introduces several simple usage of the re module in python. For detailed usage, please refer to these two blog posts in the blog

 

Python is very powerful in text processing. Thanks to powerful modules such as re, re module is a super powerful tool used to process text. Of course, other languages may also provide regular expression modules, but I personally feel that python's re module is doing very well.

 

Of course, what is a regular expression? It sounds mysterious. To put it bluntly, I will give you a large text file that allows you to find regular statements in it. How can you find them? For example, if you want to find a statement like this: ABCDabcd, you may say you can simply use the str class to search for it, then I ask if you want to find a such ABCD ??? Abcd, where the question mark represents any number, that is, the form of (ABCD three digits abcd? It's hard for you, hahaha. Now it's okay to use regular expressions to handle such problems.

 

A Brief Introduction to the process of using regular expressions:

A. initialize A regular expression engine.

B. Use this engine to search for qualified results in a given text

 

Let's take a look at the re module syntax. See:




FUNCTIONS In re:

 

Re. compile (pattern, flags = 0)

Returns a regular expression object. The following two statements are equivalent:

Statement 1:

Prog = re. compile (pattern)

Result = prog. match (string)

Statement 2:

Result = re. match (pattern, string)


Re. match (pattern, string, flags = 0)

Match from the beginning of string. If no match is successful at the beginning, None is returned. If the match is successful, only the first match object is returned. (The usage of match object will be introduced later)

 

Re. search (pattern, string, flags = 0)

Scan the entire string. If no match exists, None is returned. Otherwise, only the first matched matchobject is returned.

 

Re. split (pattern, string, maxsplit = 0, flags = 0)

Use a regular expression to separate strings. If the regular expression is enclosed in parentheses, the matched string will be included in the list and returned. Maxsplit is the number of splits. maxsplit = 1 is separated once. The default value is 0, which is unlimited.

Example:

>>> Re. split ('\ W +', 'words, Words .')

['Word', '']

>>> Re. split ('(\ W +)', 'words, Words, words .')

['Word', ',', 'word', ',', 'word', '.', '']

>>> Re. split ('\ W +', 'words, Words. ', 1)

['Word', 'words, Words. ']

>>> Re. split ('[a-f] +', '0a3b9', flags = re. IGNORECASE)

['0', '3', '9']

Maybe you may seem hard to understand. The prompt "\ W" indicates non-word characters, and "+" indicates matching the previous character once or multiple times. This is easier to understand, and,If the string matches at the beginning or end, the returned list starts or ends with an empty string. If the string does not match, the list of the entire string is returned.

 

Re. findall (pattern, string, flags = 0)

Find all the substrings matching the RE and return them as a list. This match is returned sequentially from left to right. If no match exists, an empty list is returned (I think this function is the most commonly used)

 

Re. finditer (pattern, string, flags = 0)

Returns an iterator that accesses each matching result (matchobject) sequentially.

Example:

Import re

P = re. compile (R' \ d + ')

For m in p. finditer ('one1two2three3four4 '):

Print (m. group (), end = '')

### Output ###

#1 2 3 4

 

Re. sub (pattern, repl, string, count = 0, flags = 0)

Use repl to replace each matched substring in the string, and then return the replaced string.

When repl is a string, you can use \ id, \ g <id>, \ g <name> to reference the group, but cannot use number 0.

When repl is a method, this method should only accept one parameter (Match object) and return a string for replacement (the returned string cannot reference the group ).

Count is used to specify the maximum number of replicas. If not specified, all replicas are replaced.

 

Re. subn (pattern, repl, string, count = 0, flags = 0)

The implemented function is the same as the sub () function, but the returned result is (new_string,Number_of_subs_made)

 


Match object:


Matching object. This is the value returned by the match (), search (), finditer () function.

Attribute:

String: The text used for matching.

Re: Specifies the Pattern object used for matching.

Pos: The index that the regular expression starts to search for in the text. The value is the same as that of the Pattern. match () and Pattern. seach () methods.

Endpos: The index of the regular expression ending search in the text. The value is the same as that of the Pattern. match () and Pattern. seach () methods.

Lastindex: Index of the last captured group in the text. If no captured group exists, the value is None.

Lastgroup: The alias of the last captured group. If this group does not have an alias or is not captured, it is set to None.

 

Method:

Group ([group1,…]) :

Obtain one or more string intercepted by a group. If multiple parameters are specified, the string is returned as a tuple. Group1 can be numbered or alias. number 0 indicates the entire matched substring. If no parameter is set, group (0) is returned. If no string is intercepted, None is returned; the group that has been intercepted multiple times returns the last intercepted substring.

Groups ([default]):

Returns the string intercepted by all groups in the form of tuples. It is equivalent to calling group (1, 2 ,... Last ). Default indicates that the group that has not intercepted the string is replaced by this value. The default value is None.

Groupdict ([default]):

Returns a dictionary that uses the alias of an alias group as the key and the intercepted substring as the value. A group without an alias is not included. The meaning of default is the same as that of default.

Start ([group]): 

Returns the starting index of the substring intercepted by the specified group in the string (index of the first character of the substring ). The default value of group is 0.

End ([group]):

Returns the ending index of the substring intercepted by the specified group in the string (index of the last character of the substring + 1 ). The default value of group is 0.

Span ([group]):

Returns (start (group), end (group )).

Expand (template ): 

Place the matched group into the template and return the result. You can use \ id, \ g <id>, \ g <name> to reference groups in template, but cannot use number 0. \ Id and \ g <id> are equivalent, but \ 10 will be considered as 10th groups. If you want to express \ 1 followed by the character '0 ', only \ g <1> 0 can be used.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.