A logical way to manipulate strings, a filtering logic for strings.
Complete expression: http://tool.oschina.net/uploads/apidocs/jquery/regexp.html
1.
1. Re.match ==> tries to match a pattern from the starting position of the string, and if the first character does not match, the re.match (pattern,sring,flags=0) does not match properly.
2. Match target:
Import re content='Hello 1234567 World_this is a regex demo' result= Re.match ('^hello\s (\d+) \sworld.*demo$', content) # Regular Expression Print(Result.group (1)) # outputs the contents of the first parenthesis Print (Result.span ()) # number of output output strings
3. Greedy match:
import re content =" hello 1234567 World_this is a regex demo " result =re.match ( " ^he.* (\d+) \sworld.*demo$ " ,content) print (Result.group (1)) #
# non-greedy match Import re content = " hello 1234567 World_this is a regex demo " result =re.match (" ^he.*? ( \d+). *demo$ " ,content) print (Result.group (1)) # output 1234567
# matching mode: Import re content=''Hello 1234567 world_this is a regex demo ' result=re.match ('^he.*? ( \d+). *?demo$', content,re.s) #RE.S is the print that matches the line break ( Result.group (1)) # output 1234567
4. Escape: Special characters need to be escaped
Import re content='Price is $' result=re.match (' Price is \$500\ ', content) #转义之后才能匹配
5. Re.search: Scan string, return the first matching character, use Search to not match
Import re content='Hello 1234567 World_this is a regex demo' result= Re.search ('hello\s (\d+) \sworld.*demo$', content) # Output all strings
6. Re.findall
Results=re.findall (' Regular expression ', Html,re. S) for in results: print(result) Print (result[1],result[2] ...)
7. Re.sub: String substitution
Import re content='Hello 1234567 World_this is a regex demo extra stings' Content=re.sub ('\d+',', content)
8. Re.compile: Compiling strings into regular expressions
Import re content=''Hello 1234567 world_this is a regex demo ' Result=re.compile ('hello.*demo', re.s) #Re.s is a match line break Result=re.match (pattern,content)
#实战练习:ImportRequestsImportRE Content=requests.get ('http://book.doubancom/'). Text Patten=re.compile ('<li,*cover.*?href= "(. *?)". *?title= "(. *?)". *?more-meta.*?author> (. *?) </span>.*?year> (. *?) </span>.*?</li>', Re. S) Results=Re.findall (pattern.content)#Print (results) forResultinchresults URL, name,author,date=Result author=re.sub ('\s',"', author) date=re.sub ('\s',"', date)Print(url,name,author,date)
Python Crawler _ Regular expression