Python automated Operation Koriyuki Common modules-re

Source: Internet
Author: User

1. Introduction

The regular expression itself is a small, highly specialized programming language, and in Python, the Cheng can be called directly to implement regular matching by embedding the re module in the embedded form. The regular expression pattern is compiled into a sequence of bytecode, which is then executed by a matching engine written in C.

2. Character meanings commonly used in regular expressions
2.1 Ordinary characters and 11 metacharacters:


Here you need to emphasize the effect of the backslash \:

    • Backslash followed by meta-character removal of special functions; (the special character escapes to ordinary characters)

    • A backslash followed by a normal character to implement special functions, (that is, predefined characters)

    • The string that corresponds to the word group that references the ordinal.

>>> Import re>>> Print (Re.search (R ' (Tina) (FEI) haha\2 ', ' Tinafeihahafei Tinafeihahatina '). Group ()) Tinafeihahafei

2.2 Predefined character sets (can be written in the character set [...] In


Emphasize the understanding of \b Word boundaries:

>>> Print (Re.findall (' \btina ', ' Tian Tinaaaa ')) []>>> print (Re.findall (R ' \btina ', ' Tian Tinaaaa ')) [ ' Tina ']>>> print (Re.findall (R ' \btina ', ' tian#tinaaaa ')) [' Tina ']>>> print (Re.findall (R ' \btina\b ') , ' tian#[email protected] ') [' Tina ']

2.3 Special Grouping usage


3. Function functions commonly used in re modules
3.1 Compile ()
Compiles a regular expression pattern that returns the schema of an object. (You can compile common regular expressions into regular expression objects, which can be a bit more efficient.) )
Format:
Re.compile (pattern,flags=0)
Pattern: The expression string used at compile time.
Flags compile flags that modify the way regular expressions are matched, such as case sensitivity, multiline matching, and so on. The usual flags are:

>>> import re>>> TT = "Tina is a good girl, she's cool, clever, and so on ..." >>> rr = Re.compi Le (R ' \w*oo\w* ') >>> print (Rr.findall (TT)) [' Good ', ' cool ']

3.2 Match ()
Determines whether the re matches the position of the string at the beginning. Note: This method is not an exact match. If the string has any remaining characters at the end of the pattern, it is still considered successful. If you want an exact match, you can add the boundary match ' $ ' at the end of the expression
Format:

>>> Import re>>> Print (re.match (' com ', ' Comwww.runcomoob '). Group ()) com>>> print (Re.match (' com ', ' Comwww.runcomoob ', re. I). Group ()) Com

3.3 Search ()

Format: Re.search (Pattern, string, flags=0)

The Re.search function looks for a pattern match within the string, as long as the first match is found and then returns none if the string does not match.

Print (Re.search (' \dcom ', ' www.4comrunoob.5com '). Group ())

The results of the implementation are as follows:

4com

* Note: match and search once matched successfully, is a match object object, and the match object object has the following methods:

    • Group () returns a string that is matched by RE

    • Start () returns the position where the match started

    • End () returns the position of the end of the match

    • Span () returns a tuple containing the position of the match (start, end)

    • Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.

A. Group () returns the whole string of re-matches,
B. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist
The C.groups () groups () method returns a tuple that contains all the group strings in a regular expression, from 1 to the included group number, usually groups () does not require parameters, and returns a tuple that is a tuple defined in a regular expression.

Import rea = "123abc456" Print (Re.search ("([0-9]*) ([a-z]*] ([0-9]*)", a). Group (0)) #123abc456, return to overall print (Re.search ("([ 0-9]*) ([a-z]*) ([0-9]*) ", a). Group (1)) #123 print (Re.search ([0-9]*] ([a-z]*) ([0-9]*)], a). Group (2)) #abc print ( Re.search ("([0-9]*) ([a-z]*] ([0-9]*)", a). Group (3)) #456

Group (1) lists the first bracket matching section, Group (2) lists the second bracket matching part, and group (3) lists the third bracket matching part.

3.4 FindAll ()
Re.findall traversal matches, you can get all the matching strings in the string and return a list.
Format:

Re.findall (Pattern, String, flags=0) p = re.compile (R ' \d+ ') print (P.findall (' O1n2m3k4 '))

The results of the implementation are as follows:

[' 1 ', ' 2 ', ' 3 ', ' 4 ']
Import Rett = "Tina is a good girl, she's cool, clever, and so on ..." rr = Re.compile (R ' \w*oo\w* ') print (Rr.findall (TT)) PRI NT (Re.findall (R ' (\w) *oo (\w) ', TT)) # () denotes sub-expression

The results of the implementation are as follows:

[' good ', ' cool '] [(' G ', ' d '), (' C ', ' l ')]

3.5 Finditer ()
Searches for a string that returns an iterator that accesses each matching result (match object) sequentially. Find all the substrings that the RE matches and return them as an iterator.

Format: Re.finditer (Pattern, string, flags=0)
ITER = Re.finditer (R ' \d+ ', ' drumm44ers drumming, 11 ... ... ') for I in Iter:print (i) Print (I.group ()) print (I.span ())

The results of the implementation are as follows:

<_sre. Sre_match object; span= (0, 2), match= ' >12 (0, 2) <_sre. Sre_match object; Span= (8, ten), Match= ' >44 (8, ten) <_sre. Sre_match object; Span= (match=), ">11", <_sre. Sre_match object; Span= (+), match= ' >10 (31, 33)


3.6 Split ()
Returns a list after splitting a string by a substring that can be matched. You can use Re.split to split a string, such as: Re.split (R ' \s+ ', text), and divide the string into a word list by space.
Format:

Re.split (Pattern, string[, Maxsplit])

The maxsplit is used to specify the maximum number of splits and does not specify that all will be split.

>>> Print (Re.split (' \d+ ', ' one1two2three3four4five5 ')) [' One ', ' I ', ' three ', ' four ', ' five ', ']

3.7 Sub ()
Returns the replaced string after replacing each of the matched substrings in a string with re.
Format:

Re.sub (Pattern, REPL, string, count)
>>> Import re>>> Text = "Jgood is a handsome boy, he's cool, clever, and so on ..." >>> print (re . Sub (R ' \s+ ', '-', text)) jgood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on ...

Where the second function is the replaced string, in this case '-'
The fourth parameter refers to the number of replacements. The default is 0, which means that each match is replaced.

Re.sub also allows for complex processing of replacements for matches using functions.
such as: Re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text, 0); Replace the space in the string ' ' with ' [] '.

>>> Import re>>> Text = "Jgood is a handsome boy, he's cool, clever, and so on ..." >>> print (re  . Sub (R ' \s+ ', Lambda m: ' [' +m.group (0) + '] ', text,0)) jgood[]is[]a[]handsome[]boy,[]he[]is[]cool,[]clever,[]and[]so[ ]on ...

3.8 Subn ()
Returns the number of replacements
Format:

SUBN (Pattern, Repl, String, count=0, flags=0)
>>> Print (Re.subn (' [1-2] ', ' A ', ' 123456abcdef ')) (' Aa3456abcdef ', 2) >>> print (Re.sub ("g.t", "have", ' I get a, I got B, I gut C ')) I have A, I has B, I have c>>> print (re.subn ("g.t", "had", ' I get a, I got B, I g UT c ')) (' I have A, I has B, I have C ', 3)

4. Some points of attention
The difference between 4.1 re.match and Re.search and Re.findall:
Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

A=re.search (' [\d] ', ' abc33 '). Group () print (a) p=re.match (' [\d] ', ' abc33 ') print (p) b=re.findall (' [\d] ', "abc33") print (b)

Execution Result:

3none[' 3 ', ' 3 ']

4.2 Greedy match vs. non-greedy matching
*?,+?,??, {m,n}? In front of the *,+, and so on are greedy matches, that is, match as much as possible, after adding the number to make it an inert match

Print (Re.findall (r "A (\d+)", ' a23b ')) print (Re.findall (r "A (\d+)", ' a23b '))

Execution Result:

[' 2 '] [' 23 ']
Print (Re.match (' < (. *) > ', ' <H1>title<H1> '). Group ()) Print (Re.match (' < (. *) > ', ' <H1>title<H1> '). Group ())

Execution Result:

<H1>title<H1><H1>
Print (Re.findall (r "A (\d+) b", ' a3333b ')) print (Re.findall (r "A (\d+?) B ", ' a3333b '))

The results of the implementation are as follows:

[' 3333 '] [' 3333 ']

It is important to note that there is no greedy mode if there are definite conditions before and after, and the non-matching mode fails.

4.3 small pits encountered with flags
Print (Re.split (' A ', ' 1a1a2a3 ', re. I)) #输出结果并未能区分大小写
This is because Re.split (pattern,string,maxsplit,flags) defaults to four parameters, and when we pass in three parameters, the system defaults to re. I is the third parameter, so it doesn't work. If you want to get here the re. I worked, written flags=re. I can.
5, the small practice of regular
5.1 Matching phone numbers

>>> Print (Re.compile (R ' \d{3}-\d{6} '). FindAll (' 010-628888 ')) [' 010-628888 ']

5.2 Matching IP

>>> Re.search (R "([01]?\d?\d|2[0-4]\d|25[0-5]) \.) {3} ([01]?\d?\d|2[0-4]\d|25[0-5]\.) "," 192.168.1.1 ") <_sre. Sre_match object; span= (0, one), match= ' 192.168.1.1 ' >

This article is from the "Hyun-dimensional" blog, please be sure to keep this source http://xuanwei.blog.51cto.com/11489734/1955306

Python automated Operation Koriyuki Common modules-re

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.