Zatan
Before using the requests module climbed the beautiful picture, today with the scrapy framework to achieve it again.
(Picture scale is really big point, but Swaiiow already have no love red dust, right when watch haha ha)
Item:
# -*-coding:utf-8-*- # Define Here the models for your scraped items ## See documentation in: # https://doc.scrapy.org/en/latest/topics/items.html Import scrapy class Girlpicitem (scrapy. Item): = scrapy. Field () = scrapy. Field () = scrapy. Field ()
Spider:
#Coding:utf-8 fromScrapy.spidersImportSpider fromScrapy.httpImportRequest fromScrapy.selectorImportSelector fromGirlpic.itemsImportGirlpicitemImportscrapyImportsysreload (SYS) sys.setdefaultencoding ('Utf-8')classGirlpicsipder (Spider): Name='Girlpic'Allowed_domains= []#allowed domain namesStart_urls = ["http://www.mzitu.com/all/"] defParse (self, response): Groups= Response.xpath ("//div[@class = ' main-content ']//ul[@class = ' Archives ']//a") Count=0 forGroupinchGroups:count= Count + 1ifCount > 5: return #Be careful here, don't use Os.exit (0)Groupurl = Group.xpath ('@href'). Extract () [0] Title= Group.xpath ("text ()"). Extract () [0] Request= Scrapy. Request (Url=groupurl, Callback=self.getgroup, meta={'title': Title,'Groupurl': Groupurl}, dont_filter=True)yieldRequestdefGetgroup (Self, Response): Maxindex= Response.xpath ("//div[@class = ' Pagenavi ']//span/text ()"). Extract () [-2] forIndexinchRange (1, int (maxindex) + 1): Pageurl= response.meta['Groupurl']+'/'+STR (index) META=Response.meta meta['Index'] =Index Request= Scrapy. Request (Url=pageurl, Callback=self.getpage, Meta=meta, dont_filter=True)yieldRequestdefGetPage (Self, Response): ImageUrl= Response.xpath ("//div[@class = ' main-image ']//img/@src"). Extract () [0]#Get Picture URLRequest = Scrapy. Request (Url=imageurl, callback=self. FormItem, meta=response.meta,dont_filter=True)yieldRequestdefFormItem (Self, Response): Title= response.meta['title'] Index= response.meta['Index'] Image=Response.body Item= Girlpicitem (title=title,index=index,image=image)yieldItem
PipeLine:
#-*-coding:utf-8-*-#Define your item pipelines here##Don ' t forget to add your pipeline to the Item_pipelines setting#see:https://doc.scrapy.org/en/latest/topics/item-pipeline.htmlImportOSImportCodecsImportsysreload (SYS) sys.setdefaultencoding ('Utf-8')classGirlpicpipeline (object):def __init__(self): Self.dirpath= u'd:\ Learning Materials' if notos.path.exists (Self.dirpath): Os.makedirs (Self.dirpath)defProcess_item (self, item, spider): Title= item['title'] Index= item['Index'] Image= item['Image'] Groupdir=Os.path.join (Self.dirpath, title)if notos.path.exists (Groupdir): Os.makedirs (groupdir) ImagePath= Os.path.join (Groupdir, str (index) + u'. jpg') file= Codecs.open (ImagePath,'WB') file.write (image) File.close ()returnItem
Python crawler-grab beautiful pictures (scrapy)