Python Re (regular expression) module detailed

Source: Internet
Author: User


One, Python relay semantic characters

A regular expression uses a backslash "\" to represent a particular form or as an escape character. This is in conflict with Python syntax, so python uses "\\\\" to represent "\" in the regular expression, because if you want to match "\" in the regular expression, you need to escape with \ to "\". The Python syntax also needs to be escaped from each of the strings, so it becomes "\\\\."
Is it a hassle to write this, and in order for the regular expression to be more readable, python specifically designed the raw string (raw string) to remind you not to use raw string when writing a file path, and there is a trap here. A raw string is a prefix of "R" as a string, such as r "\ n": two characters "\" and "n", not line breaks. This form is recommended when writing regular expressions in Python.

Second, regular expression meta-character description:

. Match any character except the line feed
^ Start of Match string
$ end of Match string
[] used to match a specified category of characters
? Repeat 0 to 1 times for the previous character
* Repeat 0 times to infinity for the previous character
{} Repeat M times for previous character
{m,n} repeats to M to n times for the previous character
\d matching number, equivalent to [0-9]
\d matches any non-numeric character, equivalent to [^0-9]
\s matches any white space character, equivalent to [FV]
\s matches any non-white-space character, equivalent to [^ FV]
\w matches any alphanumeric character, equivalent to [a-za-z0-9_]
\w matches any non-alphanumeric character, equivalent to [^a-za-z0-9_]
\b Match the start or end of a word

Iii. Importing Regular expression modules

3.1, import the regular expression module
>>> Import re
3.2, view the regular Expression module method

>>> dir (re)

[' DEBUG ', ' dotall ', ' I ', ' IGNORECASE ', ' L ', ' LOCALE ', ' M ', ' MULTILINE ', ' S ', ' Scanner ', ' T ', ' TEM PLATE ', ' U ', ' UNICODE ', ' VERBOSE ', ' X ', ' _maxcache ', ' __all__ ', ' __builtins__ ', ' __doc__ ', ' __file__ ', ' __name__ ', ' __pac kage__ ', ' __version__ ', ' _alphanum ', ' _cache ', ' _cache_repl ', ' _compile ', ' _compile_repl ', ' _expand ', ' _pattern_type ',  ' _pickle ', ' _subx ', ' compile ', ' copy_reg ', ' Error ', ' escape ', ' findall ', ' finditer ', ' match ', ' purge ', ' Search ', ' split ', ' Sre_compile ', ' sre_parse ', ' Sub ', ' subn ', ' sys ', ' template ']

>>>


Four, commonly used regular expression of the processing function
4.1, Re.search
The Re.search function finds pattern matches within a string, only to find the first match and then returns, and none if the string does not match.
Tip: Use Help when we don't use the module method
>>> Help (Re.search)
Search (pattern, string, flags=0)

First parameter: rule
Second parameter: Represents the string to match
Third parameter: Peugeot bit, used to control how regular expressions are matched
Examples: The following example Kuangl

>>> name= "hello,my name is Kuangl,nice to meet ..."

>>> K=re.search (R ' K (Uan) GL ', name)

>>> if K:.

print K.group (0), K.group (1)

... else:

... print "Sorry,not search!"

...

Kuangl Uan

4.2, Re.match
Re.match tries to match a pattern from the beginning of the string, which is equal to the first word
>>> Help (Re.match)
Match (pattern, string, flags=0)
First parameter: rule
Second parameter: Represents the string to match
Third parameter: Peugeot bit, used to control how regular expressions are matched
Instance: The following example matches the Hello Word

>>> name= "hello,my name is Kuangl,nice to meet ..."

>>> k=re.match (\h ...), name)

> >> if K:.

print K.group (0), ' \ n ', K.group (1)

... else:

... print "Sorry,not match!"

...

Hello

Hello

>>>


The difference between Re.match and Re.search: Re.match matches only the beginning of a string, if the string does not start with a regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.
4.3, Re.findall
Re.findall the string that matches the rule in the destination string

>>> Help (Re.findall)

findall (pattern, string, flags=0)


First parameter: rule
Second argument: Destination string
But three parameters: You can also select items with a rule later
The result returned is a list that is stored in a rule-compliant string and returns a null value if there is no rule-compliant string to find.
Example: Finding a mail account

>>> mail= ' <user01@mail.com> <user02@mail.com> user04@mail.com ' #第3个故意没有尖括号

>>> Re.findall (R ' (\w+@m .....) [A-z] {3}) ', Mail '

[' user01@mail.com ', ' user02@mail.com ', ' user04@mail.com ']


4.4, Re.sub
Re.sub a match to replace a string
>>> Help (Re.sub)
Sub (pattern, REPL, String, count=0)
First parameter: rule
Second argument: replaced string
Third argument: string
Fourth parameter: Number of replacements. The default is 0, which means that each match is replaced
Instance: Replace the empty space with the-

>>> test= "Hi, nice to meet where do you are your from?"

>>> re.sub (R ' \s ', '-', test)

' Hi,-nice-to-meet-you-where-are-you-from? '

>>> re.sub (R ' \s ', '-', test,5) #替换至第5个

' hi,-nice-to-meet-you-where are you from? '

>>>


4.5, Re.split
Re.split used to split strings

>>> Help (Re.split)

split (pattern, string, maxsplit=0)


First parameter: rule
Second argument: string
The third parameter: the maximum split string, the default is 0, which means that each match is split
Instance: splitting all strings

>>> test= "Hi, nice to meet where do you are your from?"

>>> Re.split (r "\s+", test)

[' Hi, ', ' nice ', ' to ', ' meet ', ' you ', ' where ', ' are ', ' and ' ', ' from? ']

>>> Re.split (r "\s+", test,3) #分割前三个

[' Hi, ', ' nice ', ' to ', ' meet where do you are your from? ']

>>>


4.6, Re.compile
Re.compile can compile regular expressions into a regular object
>>> Help (Re.compile)
Compile (pattern, flags=0)
First parameter: rule
Second parameter: Flag bit
Instance:

>>> test= "Hi, nice to meet where do you are your from?"

>>> k=re.compile (R ' \w*o\w* ') #匹配带o的字符串

>>> dir (k)

[' __copy__ ', ' __deepcopy__ ', ' FindAll ', ' Finditer ', ' match ', ' scanner ', ' Search ', ' split ', ' Sub ', ' subn ']

>>> print k.findall (test) #显示所有包涵o的字符串 c4/>[' to ', ' you ', ' your ', ' from '

>>> print k.sub (lambda m: ' [' + m.group (0) + '] ', test) # Enclose the word in the string containing o in [] C6/>hi, nice [to] meet [your] where are [you] [from]?


>>>
The script of downloading files with Urllib2, RE, OS module

#!/usr/bin/env python
import urllib2
import re
import os
url= ' http://image.baidu.com/channel/ Wallpaper '
read=urllib2.urlopen (URL). Read ()
pat = Re.compile (R ' src= ' http://.+?). JS ">")
urls=re.findall (pat,read) for
i in URLs:
url= i.replace (' src= "', '). Replace (' > ', ')
Try:
iread=urllib2.urlopen (URL). Read ()
name=os.path.basename (URL)
with open (name, ' WB ') as Jsnam E:
jsname.write (Iread)
except:


Print URL, "URL error"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.