Python crawlers discover albums and python Crawlers
Using the urllib. request provided by python3, you can easily crawl things on the webpage.
1. urllib. request. urlopen (url) Open the webpage and read ()
2. python Regular Expression Analysis image link, for example, <photo = 'HTTP: // img3a.hualvtu.com/272492/20150223/2143e9d2b51b1_cda16.jpg'>
3. urllib. request. urlretrieve (url, filename) downloads the corresponding url image and saves it to filename.
In addition, create the file directory OS. makedirs () and log.txt text records.
Read the code in detail:
# Coding = UTF-8 # by qiuimport re, osimport urllib. requestpage = 'HTTP: // fm.hualvtu.com/viewQuark.action? Id = 10150223231300000165 & un = woshiyyh & reply = false' # download htmldef download_html (url): html = urllib. request. urlopen (url ). read () return html. decode () def getImage (ht): reg = r'photo = \'(. *? \. Jpg) \ 'dt = 'obj = re. compile (reg) imglist = re. findall (obj, ht) folder = 'G:/download/photos/'if not OS. path. exists (folder): OS. makedirs (folder) logfile = open(folderpolic'log.txt ', 'wt') logfile. write ('image download source '+ page +' \ n') s = 1 for I in imglist: try: print ('The % d image is being downloaded... '% S) urllib. request. urlretrieve (I, folderpolic'pictures.jpg '% s) failed T: print ("Download error") logfile. write (I + 'Download error \ n') continue logfile. write ('image % d link -- '% s + I +' \ n') s + = 1 logfile. close () print ('Download termination') html = download_html (page) getImage (html)
Draw a journey