Finally have the time to do with the Python knowledge learned to write a simple web crawler, this example is mainly implemented with Python crawler from the Baidu Gallery to download beautiful pictures, and saved in the local, gossip less, directly posted the corresponding code as follows:
-------------------------------------------------------------------------------------------
#coding =utf-8# import Urllib and re modules import urllibimport re# define the class to get the URL of Baidu library; class gethtml: def __init__ (Self,url): self.url = url def gethtml (self): page = urllib.urlopen (Self.url) Html = page.read () return html # Defines the class that handles the Gethtml class gethtml return value (the link address of a picture of a beautiful woman in a Baidu gallery);# This class mainly implements the extraction of the image link address and the download of the corresponding picture (the downloaded picture is stored directly locally); class getimg: def __init__ (self,html): self.html = html def Getimg (self): reg = r ' "Thumblargeurl" : "(. +?\.jpg) "' imgre = re.compile (Reg,re. S|re. M) imglist = re.findall (imgre,self.html) # print imglist x = 1 for imgurl in imglist: urllib.urlretrieve (Imgurl, '%s.jpg ' %  X) y = x+1 print '%s picture download complete, download section%s, please later ... ' % (x, y) x+=1 x-=1 print '--------This download completed, total download picture% S-Zhang---------' %x# define the program's main entry if __name__== ' __main__ ': url = ' http://image.baidu.com/channel?c=%E7%BE%8E%E5%A5%B3#%E7%BE%8E% E5%a5%b3 " test = gethtml (URL) p = Test.gethtml () m = getimg (P) m.getimg ()
This article is from the "Simple New Life" blog, please be sure to keep this source http://857768.blog.51cto.com/847768/1641193
Writing a simple web crawler using Python (i)