Continue to Tinker Crawler, today posted a code, crawl dot dot net "beautiful" under the label of the picture, the original.
#-*-Coding:utf-8-*-#---------------------------------------# program: dot beautiful picture crawler # version: 0.2 # Author: Zippera # Date: 2013-07-26 # language: Python 2.7 # Description: Can set the number of pages to download #--------------------------------------- Import Urllib2import Urllibimport re pat = re.compile (' \n.*?imgsrc= ' (ht.*?) \".*?') NEXTURL1 = "Http://www.diandian.com/tag/%E7%BE%8E%E5%A5%B3?page=" count = 1 while Count < 2: print "page" + St R (count) + "\ n" myurl = nexturl1 + str (count) myres = Urllib2.urlopen (myurl) mypage = Myres.read () Ucpage = Mypage.decode ("Utf-8") #转码 mat = Pat.findall (ucpage) If Len (MAT): cnt = 1 for item in Mat:
print "page" + str (count) + "No." + str (CNT) + "URL:" + item + "\ n" cnt + = 1 FNP = Re.compile (' (\w{10}\.\ w+) $ ') FNR = Fnp.findall (item) if FNR: fname = fnr[0] urllib.urlretrieve (item, fname) else: print "No data" count + = 1
How to use: Create a new folder, save the code as a name.py file, and run Python name.py to download the image to a folder.