Python regular expressions, and apps [download pictures]

Source: Internet
Author: User

Regular expresion is used to filter the target string by a series of specific characters and their combined strings:

Re-related knowledge points

The Python regular expression library is re, imported with the import Re, and then compiled with Re.compile (Pattern,flag) to compile the regular expression string into a regular expression object. Use the built-in functions provided by re to match, search, replace, slice, and group the strings.

Flags commonly used to take values:
Re. I ignore case, re. X Ignore spaces

ImportReDefCheck(String):P=Re.Compile("^[\w-]+ (\.[ \w-]+) *@[\w-]+ (\.[ \w-]+) +$ ",Re.I) IfP.Match (string print ( "%s conforms to rules" %string)  else: print  ( "%s does not conform to the rules" %stringst1= ' [email protected] ' st2 = "[email protected] ' check ( st1) check (st2               /span>                
[email protected].com符合规则[email protected].com符合规则

Re.match () match from start position
Re.search () Searches for the entire string match, and the search successfully returns the starting and ending positions.
Re.findall () returns all matched substrings as a list

>>> Print(P.Match(' DAA00 '))None>>>Re.Match(' ADF ',' SDADFG ')>>>Re.Search(' ADF ',' SDADFGADF ')<_sre. Sre_match object; Span= ( 2, 5),  Match= ' ADF ' >>>> Re. ( ' ADF ' , ' SDADFGADF ' [ ' ADF ' , ' ADF ' ]             

Segmentation
In practical applications, different data sources use different separators, which may be spaces, tab symbols, commas, and so on. The regular expression and the split () function can be conveniently separated.
Re.split (Pattern,string[,maxsplit])

. Separate Open

>>> st=‘https:\\www.baidu.com‘>>> lt=re.split(‘\.‘,st)>>> lt[‘https:\\www‘, ‘baidu‘, ‘com‘]

separated by commas and spaces.

  >>> St=  ' DF lx 23,77 ' >>> li =re. ( "[\s\,] ' ,st) Span class= "PLN" >>>> Li[ ' DF ' Span class= "pun" >,  ' LX ' , ' ,  '  

Replace, using the sub () and SUBN () functions in the RE library, you can replace the contents of the regular expression with the specified string.
Sub () returns the replaced string
SUBN () is the number of times a new string and substitution are returned in a tuple type.

Keyword harmony, re writing still a bit of a problem

Download the pictures of Jane book friends.

I have regular expressions that match 10 articles, but some have no pictures, some
The picture label matches the wrong, has the time to revise. Ready to traverse the whole project, download all the pictures, hehe, also to judge Gender, find the fellow.

ImportUrllib.RequestImportUrllib.ParseImportReImportOsDefGet_road(Url0):Req=Urllib.Request.Request(Url0)Req.Add_header(' User-agent ', ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 ' ' (khtml, like Gecko) chrome/52.0.2743.116 safari/537.36 ')Response=Urllib.Request.Urlopen(Req)Html=Response.Read().Decode("Utf-8")Pattern=Re.Compile(R' <a class= ' title ' target= ' _blank ' href= ' (. *?) "")Result=Re.FindAll(Pattern,Html) ReturnResultDefGet_jiaoyou_url(Result,S0):S=S0ReturnGeturl(Result,S)DefGethtml(ur):Url=ur req=Urllib.Request.Request(Url)Req.Add_header(' User-agent ', ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 ' ' (khtml, like Gecko) chrome/52.0.2743.116 safari/537.36 ')Response=Urllib.Request.Urlopen(Req)Html=Response.Read().Decode("Utf-8") ReturnHtmlDefGetPath(Html): #reg =r '. *?\.png 'Reg=R' Imgre=Re.Compile(Reg)URLs=Imgre.FindAll(Html) ReturnURLsDefGeturl(Url,S):URLs=[S+Str(I) ForIInchUrl] ForIInchRange(Len(URLs)): Print(URLs[I]) Print("Url_length=",Len(URLs)) ReturnURLsDefDownload(URLs):X=10 Print("Length=",Len(URLs)) ForUrlInchURLs:FileName='/home/dflx/download/jiaoyou_photo/'+Str(X)+'. png 'Urllib.Request.Urlretrieve(Url,FileName)X+=1 Print(X)DefDownload_all(URLs): Print(Len(URLs)) Print(‘---------------‘)Index=0 WhileIndex<Len(URLs): Print(URLs[Index]) #download (Urls[index])Index+=1 Print("********")DefMain():Url0="Https://www.jianshu.com/c/bd38bd199ec6" #ur = ' https://www.jianshu.com/p/407dac18983c ' Ur= ' https://www.jianshu.com/p/189d1b8101e6 '  Html=gethtml (ur Path=getpath ( html)  Urls=geturl< Span class= "pun" > (path, ' https: ' )  Download (urls             

Download pictures of forklift truck Pictures

Python regular expressions, and apps [download pictures]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.