Regular expresion is used to filter the target string by a series of specific characters and their combined strings:
Re-related knowledge points
The Python regular expression library is re, imported with the import Re, and then compiled with Re.compile (Pattern,flag) to compile the regular expression string into a regular expression object. Use the built-in functions provided by re to match, search, replace, slice, and group the strings.
Flags commonly used to take values:
Re. I ignore case, re. X Ignore spaces
ImportReDefCheck(String):P=Re.Compile("^[\w-]+ (\.[ \w-]+) *@[\w-]+ (\.[ \w-]+) +$ ",Re.I) IfP.Match (string print ( "%s conforms to rules" %string) else: print ( "%s does not conform to the rules" %stringst1= ' [email protected] ' st2 = "[email protected] ' check ( st1) check (st2 /span>
[email protected].com符合规则[email protected].com符合规则
Re.match () match from start position
Re.search () Searches for the entire string match, and the search successfully returns the starting and ending positions.
Re.findall () returns all matched substrings as a list
>>> Print(P.Match(' DAA00 '))None>>>Re.Match(' ADF ',' SDADFG ')>>>Re.Search(' ADF ',' SDADFGADF ')<_sre. Sre_match object; Span= ( 2, 5), Match= ' ADF ' >>>> Re. ( ' ADF ' , ' SDADFGADF ' [ ' ADF ' , ' ADF ' ]
Segmentation
In practical applications, different data sources use different separators, which may be spaces, tab symbols, commas, and so on. The regular expression and the split () function can be conveniently separated.
Re.split (Pattern,string[,maxsplit])
. Separate Open
>>> st=‘https:\\www.baidu.com‘>>> lt=re.split(‘\.‘,st)>>> lt[‘https:\\www‘, ‘baidu‘, ‘com‘]
separated by commas and spaces.
>>> St= ' DF lx 23,77 ' >>> li =re. ( "[\s\,] ' ,st) Span class= "PLN" >>>> Li[ ' DF ' Span class= "pun" >, ' LX ' , ' , '
Replace, using the sub () and SUBN () functions in the RE library, you can replace the contents of the regular expression with the specified string.
Sub () returns the replaced string
SUBN () is the number of times a new string and substitution are returned in a tuple type.
Keyword harmony, re writing still a bit of a problem
Download the pictures of Jane book friends.
I have regular expressions that match 10 articles, but some have no pictures, some
The picture label matches the wrong, has the time to revise. Ready to traverse the whole project, download all the pictures, hehe, also to judge Gender, find the fellow.
ImportUrllib.RequestImportUrllib.ParseImportReImportOsDefGet_road(Url0):Req=Urllib.Request.Request(Url0)Req.Add_header(' User-agent ', ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 ' ' (khtml, like Gecko) chrome/52.0.2743.116 safari/537.36 ')Response=Urllib.Request.Urlopen(Req)Html=Response.Read().Decode("Utf-8")Pattern=Re.Compile(R' <a class= ' title ' target= ' _blank ' href= ' (. *?) "")Result=Re.FindAll(Pattern,Html) ReturnResultDefGet_jiaoyou_url(Result,S0):S=S0ReturnGeturl(Result,S)DefGethtml(ur):Url=ur req=Urllib.Request.Request(Url)Req.Add_header(' User-agent ', ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 ' ' (khtml, like Gecko) chrome/52.0.2743.116 safari/537.36 ')Response=Urllib.Request.Urlopen(Req)Html=Response.Read().Decode("Utf-8") ReturnHtmlDefGetPath(Html): #reg =r '. *?\.png 'Reg=R' Imgre=Re.Compile(Reg)URLs=Imgre.FindAll(Html) ReturnURLsDefGeturl(Url,S):URLs=[S+Str(I) ForIInchUrl] ForIInchRange(Len(URLs)): Print(URLs[I]) Print("Url_length=",Len(URLs)) ReturnURLsDefDownload(URLs):X=10 Print("Length=",Len(URLs)) ForUrlInchURLs:FileName='/home/dflx/download/jiaoyou_photo/'+Str(X)+'. png 'Urllib.Request.Urlretrieve(Url,FileName)X+=1 Print(X)DefDownload_all(URLs): Print(Len(URLs)) Print(‘---------------‘)Index=0 WhileIndex<Len(URLs): Print(URLs[Index]) #download (Urls[index])Index+=1 Print("********")DefMain():Url0="Https://www.jianshu.com/c/bd38bd199ec6" #ur = ' https://www.jianshu.com/p/407dac18983c ' Ur= ' https://www.jianshu.com/p/189d1b8101e6 ' Html=gethtml (ur Path=getpath ( html) Urls=geturl< Span class= "pun" > (path, ' https: ' ) Download (urls
Download pictures of forklift truck Pictures
Python regular expressions, and apps [download pictures]