The first regular expression of Python
1. Import re:python Regular Expression Module 2. The first regular expression re.compile (R ' Imooc ') pattern.match (' Imooc python ')
Example:Import Repa = Re.compile (R ' Imooc ') #返回一个Pattern类型对象pama = Pa.match (' Imooc python ') #返回一个match对象maprint Ma.group () # Get the matching result print Ma.span () #获得匹配区间print ma.string #匹配字符串print ma.re #pattern对象 pa = re.compile (R ' Imooc ', re. i) # i = = Ignore, ignoring case matching ma = Pa.match (' Imooc python ') print ma.group () pa = Re.compile (R ' (IMOOC) ') #加入小括号, put into group Ma = Pa.match (' Imooc python ') print ma.groups () ma = Re.match (R ' Imooc ', ' Imooc python ') #直接匹配print Ma.group ()
python Regular expression Syntax
Character |
The |
. |
Match any character (except \ n) |
[...] |
Match Character Set |
\d/\d |
Match digital/non-digital |
\s/\s |
Match whitespace/non-whitespace characters |
\w/\w |
Match word character [a-za-z0-9]/non-word character |
Single-Character Match example: import Rema = Re.match (R '. ', ' abc0 ') print ma.group () ma = Re.match (r ' {[ABC]} ', ' {b} ') print ma.group () ma = Re.match (R ' [A-z] ', ' a ') print ma.group () ma = Re.match (R ' \[[\w]\] ', ' [A] ') print ma.group ()
Character |
The |
* |
Match the previous character 0 or unlimited times |
+ |
Match the previous character 1 or unlimited times |
? |
Match a previous character 0 or 1 times |
{m}/{M.N} |
Matches the previous character m times to N times |
*? / +? / ?? |
The matching pattern becomes non-greedy (matches as few characters as possible) |
Example: #-*-coding:utf-8-*-import Rema = Re.match (R ' [a-z][a-z]* ', ' Aa ') print ma.group () ma = Re.match (R ' [_a-za-z]+[_\w]* ', ' _helloworld01 ') #匹配一个变量名print ma.group () ma = Re.match (R ' [0-9]?[ 0-9] ', ' (') ') #匹配0到99print Ma.group ()
Boundary Matching
Character |
The |
^ |
Match string start |
$ |
Match string End |
\a/\z |
The specified string must be worn now beginning/end |
Example: Import Rema = Re.match (R ' ^[\w]{4,10}@163.com$ ', '[email protected]') print ma.group () ma = Re.match (R ' \aimooc[\w]* ', ' Imooc python ') print Ma.group ()
Group Matching
Character |
The |
| |
Match one or both of the expressions |
(AB) |
Bracket expression as a grouping |
\<number> |
String that refers to a grouping that is numbered num |
(? p<name>) |
Group up an alias |
(? P=name) |
Reference a grouping match string with alias name |
Example: #-*-coding:utf-8-*-import Rema = Re.match (R ' [0-9]?\d$|100 ', ' + ') #匹配0到100print ma.group () ma = Re.match (R ' [\w]{4,6 }@ (163|126). com ', '[email protected]') print ma.group () ma = Re.match (R ' < ([\w]+>) \1 ', ' <book>book> ') print ma.group () ma = Re.match (R ' < ([\w ]+>) [\w]+</\1 ', ' <book>python</book> ') #print ma.group () ma = Re.match (R ' < (? p<mark>[\w]+>) [\w]+</(? P=mark) ', ' <book>python</book> ') print Ma.group ()
Other methods of the RE module1:search (Pattern, String, flags=0) finds matches in a string 2:findall (pattern, String, flags=0) finds a match, Returns a list of all matching parts 3:sub (pattern, REPL, String, Count, flags=0) replaces the part of a string that matches a regular expression with another value 4:split (pattern, String, Maxsplit=0, flags=0) Returns an example of a column that consists of a split string, based on a matching split string: Import repa = re.compile (' <[\w]+> ') Ma = re.search (pa, ' <a><b><c><d><efg>h<i> ') print Ma.group () pa = re.compile (' <[\w]+> ') l = re.findall (pa, ' <a><b><c ><d><efg>h<i> ') print ls = re.sub (R ' \d+ ', ', ' I was born in 1993 ') print s def add1 (match): The sub function of the &NBSP;&NBSP;#&NBSP;REPL bit function val = match.group () Num = int (val) +1 return str (num) s = re.sub (R ' \d+ ', &NBSP;ADD1, ' i was born in 1993, you are born in 1992 ') print s l = re.split (R ': | |, ', ' imooc:c c++ java python,c# ') print l
Practice
crawl images from a Web page to a local
1: Crawling Web pages
2: Grab image address
3: Grab picture contents and save to local
Python Regular Expression Learning notes