Python in Re (regular expression) module function learning

Source: Internet
Author: User
Tags object object

Learn about regular expressions in Python today. On the syntax of regular expressions, there are many studies on the Internet without much explanation. This article mainly introduces the regular expression handler functions commonly used in Python.

Method/Property Role
Match () Determines if RE is matched at the beginning of the string
Search () Scan the string to find the location of the RE match
FindAll () Find all the substrings that the RE matches and return them as a list
Finditer () Find all the substrings that the RE matches and return them as an iterator

The match () function only checks to see if the RE is matched at the beginning of the string, and search () scans the entire string.

Match () only reports a successful match, it starts at 0, and if the match does not start at 0, match () will not report it.

Search () scans the entire string and reports the first match it finds.


Match (), Seerch (), Finditer () if the match succeeds, returns a Match object object that has the following properties, methods:

Method/Property Role
Group () Returns the string that is matched by the RE
Start () Returns the position where the match started
End () Returns the position where the match ended
Span () Returns a tuple containing the location of a match (start, end)

Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.

1. Group () returns the whole string of re-matches,

2. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist

#!python
>>> p = re.compile (' (A (b) c) d ')
>>> m = p.match (' ABCD ')
>>> M.group (0)
' ABCD '
>>> M.group (1)
' ABC '
>>> M.group (2)
' B '

The groups () method returns a containing all groups in regular expressionsA tuple of strings, from 1 to the included group number, typically groups () does not require parameters, and returns a tuple in which the tuple is defined in a regular expression.

#!python
>>> p = re.compile (' (A (b) c) d ')
>>> m = p.match (' ABCD ')
>>> m.groups ()
(' abc ', ' B ')

Use the index to get the appropriate group content, for example: m.groups () [0]

P2=re.compile (R "' (\d) +\w", Re. X

>>> p2.match (' 123a b12123c '). Group () # Re regular expression ' (\d) +\w matching string

' 123a '

>>> p2.match (' 123a b12123c '). Group (0)

' 123a '

>>> p2.match (' 123a b12123c '). String matched by group (1) #返回正则表达式中第一个小组即 (\d)

' 3 '

>>> P2.match (' 123a B12 123c '). Groups ()
(' 3 ',)

Re.match, matches from the beginning of the string, returns a match Object, or none

Re.match tries to match a pattern from the beginning of the string, such as: The following example matches the first word.

Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." M = Re.match (r "(\w+) \s", text) if M:print M.grou P (0), ' \ n ', M.group (1) else:print ' not match '  

Re.match's function prototype is: Re.match (pattern, string, flags)

The first parameter is a regular expression, here is "(\w+) \s", if the match succeeds, returns a match, otherwise returns a none;

The second parameter represents the string to match;

The third parameter is the Peugeot bit, which controls how regular expressions are matched, such as case sensitivity, multiline matching, and so on.

Method/Property Role
Group () Returns the string that is matched by the RE
Start () Returns the position where the match started
End () Returns the position where the match ended
Span () Returns a tuple containing the location of a match (start, end)

Re.search finds a match within a string, finds the first match, returns a match Object, or none

The Re.search function looks for pattern matching within a string until the first match is found and then returns none if the string does not match.

Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." M = Re.search (R ' \shan (ds) ome\s ', text) if M:prin T M.group (0), M.group (1) else:print ' not search '  

Re.search's function prototype is: Re.search (pattern, string, flags)

Each parameter has the same meaning as Re.match.

the difference between Re.match and Re.search:Re.match matches only the beginning of the string, if the string starts not conforming to the regular expression, the match fails, the function returns none, and the Re.search matches the entire string until a match is found.

Re.sub replaces all occurrences, returns a replacement string, returns the original string if the match fails

The re.sub is used to replace a match in a string. The following example replaces a space in a string with a '-':

Re.sub's function prototype is: re.sub (Pattern, REPL, string, count)

Where the second function is the replaced string, in this case '-'

The fourth parameter refers to the number of replacements. The default is 0, which means that each match is replaced.

Re.sub also allows for complex processing of replacements for matches using functions. such as: Re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text, 0); Replace the space in the string ' ' with ' [] '.

The sub () method provides a replacement value, which can be a string or a function, and a string to be processed

When a module-level re.sub () function is used, the pattern is used as the first parameter. The pattern may be a string or a ' regexobject '; If you need to specify a regular expression flag, you must either use ' Regexobject ' to do the first argument, or use the pattern inline modifier, such as Sub ("(? i) B +", "X", "BBBB bbbb") Returns ' X x '.

Import re

def hexrepl (Match):
"Return the hex string for a decimal number"
value = Int (Match.group ())
return Hex (value)

p = re.compile (R ' \d+ ')

Print p.sub (HEXREPL, ' Call 65490 for printing, 49152 for user code. ')
#Call 0xffd2 for printing, 0xc000 for user code.


Import re

Text = "Jgood is a handsome boy, he's cool, clever, and so on ..."

Print re.sub (R ' \s+ ', '-', text)
#JGood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on ...

Print re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text)
#JGood []is[]a[]handsome[]boy,[]he[]is[]cool,[]clever,[]and[]so[]on ...

Print re.sub (R ' a ', Lambda m: ' [' + m.group (0) + '] ', text) #在a的两边加 [], can also be used with String.Replace ()

#JGood is [a] h[a]ndsome boy, he's cool, clever, [a]nd so on ...

SUBN () is the same as sub (), but returns a new string and number of replacements

Print re.subn (' i ', ' I ', ' Paris in the Spring ') # (' Paris in the Spring ', 3)

Empty matches are replaced only if they are not next to the previous match.

#!python

>>> p = re.compile (' x* ')

>>> p.sub ('-', ' abxd ')

'-a-b-d-'

Re.split returning a segmented string as a list

You can use Re.split to split a string, such as: Re.split (R ' \s+ ', text), and divide the string into a word list by space.

Split (string [, Maxsplit = 0])

You can limit the number of shards by setting the Maxsplit value. When Maxsplit is nonzero, there can be only maxsplit shards, and the remainder of the string is returned as the last part of the list. In the following example, the delimiter can be any sequence of non-alphanumeric alphabetic characters.

#!python
>>> p = re.compile (R ' \w+ ')
>>> P.split (' This was a test, short and sweet, of Split (). ')
[' This ', ' was ', ' a ', ' test ', ' short ', ' and ', ' Sweet ', ' of ', ' split ', ']
>>> P.split (' This was a test, short and sweet, of Split (). ', 3)
[' This ', ' was ', ' a ', ' test, short and sweet, ' of Split () '.

Sometimes, you are not only interested in the text between delimiters, you also need to know what the delimiter is. delimiters can be any sequence of non-alphanumeric alphabetic characters, and if the capturing brackets are used in Re, their (delimiter) values are returned as part of the list. Compare the following calls:

Re.split ("([AB])", "Carbs") # [' C ', ' A ', ' R ', ' B ', ' s '] delimiter is a or B, the result returns the qualifier A, B.

Re.split ("([ab]#)", "Carbs") # [' Carbs '] delimiter is a# or b#, result [' carbs ']


#!python
>>> p = re.compile (R ' \w+ ')
>>> P2 = re.compile (R ' (\w+) ')
>>> p.split (' This ... is a test. ')
[' This ', ' is ', ' a ', ' test ', ']
>>> p2.split (' This ... is a test. ')
[' This ', ' ... ', ' is ', ', ' a ', ' ', ' test ', '. '

Re.findall returns all matching strings as a list

Re.findall can get all the matching strings in the string. such as: Re.findall (R ' \w*oo\w* ', text); Gets all the words in the string that contain ' oo '.

Pattern matches pattern and gets this match


Import re

Text = "Jgood is a handsome boy,he are handsome and cool,clever,and so on ...."


Print Re.findall (R ' \w*oo\w* ', text) #结果: [' jgood ', ' cool ']

Print Re.findall (R ' (\w) *oo (\w) * ', Text) # () indicates the result of the subexpression: [(' G ', ' d '), (' C ', ' l ')]

In Python 2.2, you can also use the Finditer () method.

#!python>>> iterator = P.finditer (' Drummers drumming, 11 ... (... ') >>> Iterator<callable-iterator object at 0x401833ac>>>> to match in iterator:     ... Print Match.group (), Match.span ()     ... 12 (0, 2) 11 (22, 24) 10 (29, 31)


Re.compile

You can compile a regular expression into a regular expression object. It is possible to compile regular expressions that are often used as regular expression objects, which can improve some efficiency. Here is an example of a regular expression object:

Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." regex = Re.compile (R ' \w*oo\w* ') print Regex.find All (text)   #查找所有包含 ' oo ' word print regex.sub (lambda m: ' [' + m.group (0) + '] ', text) #将字符串中含有 ' oo ' words are enclosed in [].

Transferred from: http://www.python8.org/a/fenleiwenzhang/yuyanjichu/2009/0901/150.html

Top
0
Step

Python in Re (regular expression) module function learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.