Python Regular Expressions and python Regular Expressions

Last Update:2014-11-12 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python Regular Expressions and python Regular Expressions

Refer:

BlogPython Regular Expression Guide

BlogRe module of Python

I would like to thank the bloggers for their selfless dedication. This article mainly introduces several simple usage of the re module in python. For detailed usage, please refer to these two blog posts in the blog

Python is very powerful in text processing. Thanks to powerful modules such as re, re module is a super powerful tool used to process text. Of course, other languages may also provide regular expression modules, but I personally feel that python's re module is doing very well.

Of course, what is a regular expression? It sounds mysterious. To put it bluntly, I will give you a large text file that allows you to find regular statements in it. How can you find them? For example, if you want to find a statement like this: ABCDabcd, you may say you can simply use the str class to search for it, then I ask if you want to find a such ABCD ??? Abcd, where the question mark represents any number, that is, the form of (ABCD three digits abcd? It's hard for you, hahaha. Now it's okay to use regular expressions to handle such problems.

A Brief Introduction to the process of using regular expressions:

A. initialize A regular expression engine.

B. Use this engine to search for qualified results in a given text

Let's take a look at the re module syntax. See:

FUNCTIONS In re:

Re. compile (pattern, flags = 0)

Returns a regular expression object. The following two statements are equivalent:

Statement 1:

Prog = re. compile (pattern)

Result = prog. match (string)

Statement 2:

Result = re. match (pattern, string)

Re. match (pattern, string, flags = 0)

Match from the beginning of string. If no match is successful at the beginning, None is returned. If the match is successful, only the first match object is returned. (The usage of match object will be introduced later)

Re. search (pattern, string, flags = 0)

Scan the entire string. If no match exists, None is returned. Otherwise, only the first matched matchobject is returned.

Re. split (pattern, string, maxsplit = 0, flags = 0)

Use a regular expression to separate strings. If the regular expression is enclosed in parentheses, the matched string will be included in the list and returned. Maxsplit is the number of splits. maxsplit = 1 is separated once. The default value is 0, which is unlimited.

Example:

>>> Re. split ('\ W +', 'words, Words .')

['Word', '']

>>> Re. split ('(\ W +)', 'words, Words, words .')

['Word', ',', 'word', ',', 'word', '.', '']

>>> Re. split ('\ W +', 'words, Words. ', 1)

['Word', 'words, Words. ']

>>> Re. split ('[a-f] +', '0a3b9', flags = re. IGNORECASE)

['0', '3', '9']

Maybe you may seem hard to understand. The prompt "\ W" indicates non-word characters, and "+" indicates matching the previous character once or multiple times. This is easier to understand, and,If the string matches at the beginning or end, the returned list starts or ends with an empty string. If the string does not match, the list of the entire string is returned.

Re. findall (pattern, string, flags = 0)

Find all the substrings matching the RE and return them as a list. This match is returned sequentially from left to right. If no match exists, an empty list is returned (I think this function is the most commonly used)

Re. finditer (pattern, string, flags = 0)

Returns an iterator that accesses each matching result (matchobject) sequentially.

Example:

Import re

P = re. compile (R' \ d + ')

For m in p. finditer ('one1two2three3four4 '):

Print (m. group (), end = '')

### Output ###

#1 2 3 4

Re. sub (pattern, repl, string, count = 0, flags = 0)

Use repl to replace each matched substring in the string, and then return the replaced string.

When repl is a string, you can use \ id, \ g <id>, \ g <name> to reference the group, but cannot use number 0.

When repl is a method, this method should only accept one parameter (Match object) and return a string for replacement (the returned string cannot reference the group ).

Count is used to specify the maximum number of replicas. If not specified, all replicas are replaced.

Re. subn (pattern, repl, string, count = 0, flags = 0)

The implemented function is the same as the sub () function, but the returned result is (new_string,Number_of_subs_made)

Match object:

Matching object. This is the value returned by the match (), search (), finditer () function.

Attribute:

String: The text used for matching.

Re: Specifies the Pattern object used for matching.

Pos: The index that the regular expression starts to search for in the text. The value is the same as that of the Pattern. match () and Pattern. seach () methods.

Endpos: The index of the regular expression ending search in the text. The value is the same as that of the Pattern. match () and Pattern. seach () methods.

Lastindex: Index of the last captured group in the text. If no captured group exists, the value is None.

Lastgroup: The alias of the last captured group. If this group does not have an alias or is not captured, it is set to None.

Method:

Group ([group1,…]) :

Obtain one or more string intercepted by a group. If multiple parameters are specified, the string is returned as a tuple. Group1 can be numbered or alias. number 0 indicates the entire matched substring. If no parameter is set, group (0) is returned. If no string is intercepted, None is returned; the group that has been intercepted multiple times returns the last intercepted substring.

Groups ([default]):

Returns the string intercepted by all groups in the form of tuples. It is equivalent to calling group (1, 2 ,... Last ). Default indicates that the group that has not intercepted the string is replaced by this value. The default value is None.

Groupdict ([default]):

Returns a dictionary that uses the alias of an alias group as the key and the intercepted substring as the value. A group without an alias is not included. The meaning of default is the same as that of default.

Start ([group]):

Returns the starting index of the substring intercepted by the specified group in the string (index of the first character of the substring ). The default value of group is 0.

End ([group]):

Returns the ending index of the substring intercepted by the specified group in the string (index of the last character of the substring + 1 ). The default value of group is 0.

Span ([group]):

Returns (start (group), end (group )).

Expand (template ):

Place the matched group into the template and return the result. You can use \ id, \ g <id>, \ g <name> to reference groups in template, but cannot use number 0. \ Id and \ g <id> are equivalent, but \ 10 will be considered as 10th groups. If you want to express \ 1 followed by the character '0 ', only \ g <1> 0 can be used.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More