Use regular expressions in Python, a small crawler, to grab a JPG image of the Wikipedia entry page. Here is my code, as a reference:
#Coding=utf-8#__author__ = ' Hinfa 'ImportReImportOS fromUrllibImportRequest as Requrl='Https://baike.baidu.com/item/%E5%B9%BF%E5%B7%9E/72101?fr=aladdin'Path='test//encyclopedia Guangzhou Pictures 2'os.mkdir (path) fo=open (path+'//filecatalog.txt','w+') Fo.write ('To crawl a JPG directory:') Page=req.urlopen (URL) HTML=page.read (). Decode ('Utf-8') Jpgre=re.compile (R'https.*?\.jpg') Jpglist=Re.findall (jpgre,html) I=0 forJpginchjpglist:jpg=re.sub (R'\\\/','/', jpg)Print(jpg) filepath=path+'//%d.jpg'%I fo.write ('\ n'+jpg) req.urlretrieve (jpg,filepath) I+=1Fo.write ('\ n'+'Total Crawl'+str (i) +'a') Fo.flush () fo.close ()
Program Run Result:
Then open the Filecatalog.txt file generated in the catalog file and crawl the contents as follows:
Image of directory address download:
First Crawler, very excited, also feel very magical:-)
Python Crawl page pictures