The so-called crawl is actually getting the contents of the link saved locally. So before you crawl, you need to know what the link to crawl is.
The page to crawl is this: http://findicons.com/pack/2787/beautiful_flat_icons
There are a lot of good icons, the goal is to put these files to climb down, save the cost of pictures.
How do you do it with Python3?
First step: Get the contents of the parent page to crawl
Import urllib.request Import "http://findicons.com/pack/2787/beautiful_flat_icons"webpage= == Data.decode ('UTF-8')
The second step: the parent Web content processing, extract the inside of the image link
K = Re.split (r'\s+', data) s=[]sp=[]si= [] forIinchK:if(Re.match (R'src'IorRe.match (R'href', i)):if( notRe.match (R'href= "#"', i)):if(Re.match (R'. *?png "'IorRe.match (R'. *?ico "', i)):if(Re.match (R'src', i)): S.append (i) forItinchS:if(Re.match (R'. *?png "', it): Sp.append (IT)
Step three: Get the contents of these image links and save the cost to the picture
CNT =0cou= 1 forItinchsp:m= Re.search (r'src= "(. *?)"', it) Iturl= M.group (1) Print(Iturl)if(iturl[0]=='/'): Continue; Web=Urllib.request.urlopen (iturl) Itdata=Web.read ()if(cnt%3==1 andCnt>=4 andCou<=30): F= Open ('d:/pythoncode/simplecodes/image/'+str (COU) +'. PNG',"WB") Cou= Cou+1f.write (Itdata) f.close ()Print(IT) CNT= Cnt+1
The Save directory can be set by itself.
The following is a combination of code:
Importurllib.requestImportReurl="http://findicons.com/pack/2787/beautiful_flat_icons"webpage=urllib.request.urlopen (URL) data=webpage.read () data= Data.decode ('UTF-8') K= Re.split (r'\s+', data) s=[]sp=[]si= [] forIinchK:if(Re.match (R'src'IorRe.match (R'href', i)):if( notRe.match (R'href= "#"', i)):if(Re.match (R'. *?png "'IorRe.match (R'. *?ico "', i)):if(Re.match (R'src', i)): S.append (i) forItinchS:if(Re.match (R'. *?png "', it): Sp.append (IT) CNT=0cou= 1 forItinchsp:m= Re.search (r'src= "(. *?)"', it) Iturl= M.group (1) Print(Iturl)if(iturl[0]=='/'): Continue; Web=Urllib.request.urlopen (iturl) Itdata=Web.read ()if(cnt%3==1 andCnt>=4 andCou<=30): F= Open ('d:/pythoncode/simplecodes/image/'+str (COU) +'. PNG',"WB") Cou= Cou+1f.write (Itdata) f.close ()Print(IT) CNT= Cnt+1
Python3 Bulk Crawl Page pictures