This site http://www.hbc333.com/is a wallpaper image site that offers a variety of resolutions for the download of images, so want to write a crawler script to download these pictures in bulk.
After observation, the 2560*1600 resolution of the image URL format is: http://www.hbc333.com/size/2560x1600/n/(n is the number of pages),
The address of each preview image is:/data/out/253/46659416-watch-dogs-wallpaper. jpg,
The original link is: http://www.hbc333.com/data/out/253/46659416-watch-dogs-wallpaper. jpg
#Coding=utf-8 fromUrllibImportRequestImportUrllibImportREURL1="http://www.hbc333.com/size/2560x1600/" #Wallpaper Site Start Pagedefgetmainpage (URL): page=request.urlopen (URL) HTML=Page.read ()returnHTML#the URL for each page is: http://www.hbc333.com/size/2560x1600/n/n is the number of pagesCount= 2Oriurl='http://www.hbc333.com' whileCount < 3: Newurl= Url1 + str (count) +'/'Count= Count + 1Print(newurl) HTML=getmainpage (Newurl)Print(html) dir=[] HTML= Html.decode ('Utf-8')#Python3dir = Re.findall (r'/data/out/[0-9]+/[0-9]+-[a-z]+-[a-z]+.jpg', HTML)#dir = re.findall (R '/data/out/[0-9a-za-z]+ ', HTML)x = 1 forUinchdir:#Urllib.request.urlretrieve (oriurl+u, '%s.jpg '%x) Print(oriurl+u) x= X+1
Python crawler Crawl HD image