Python爬蟲 —— 抓取美女圖片(Scrapy篇)

來源:互聯網
上載者:User

標籤:efault   ini   ide   學習資料   proc   href   rap   ack   ESS   

 

 

 

雜談:

之前用requests模組爬取了美女圖片,今天用scrapy架構實現了一遍。

(圖片尺度確實大了點,但老衲早已無戀紅塵,權當觀賞哈哈哈)

 

Item:

# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# https://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass GirlpicItem(scrapy.Item):    title = scrapy.Field()    image = scrapy.Field()    index = scrapy.Field()

 

Spider:

#coding:utf-8from scrapy.spiders import Spiderfrom scrapy.http import Requestfrom scrapy.selector import Selectorfrom girlpic.items import GirlpicItemimport scrapyimport sysreload(sys)sys.setdefaultencoding(‘utf-8‘)class GirlpicSipder(Spider):    name = ‘girlpic‘    allowed_domains = []  # 允許的網域名    start_urls = ["http://www.mzitu.com/all/"]    def parse(self, response):        groups = response.xpath("//div[@class=‘main-content‘]//ul[@class=‘archives‘]//a")        count = 0        for group in groups:            count = count + 1            if count > 5:                return   #此處小心,不要用os.exit(0)            groupUrl = group.xpath(‘@href‘).extract()[0]            title = group.xpath("text()").extract()[0]            request = scrapy.Request(url=groupUrl, callback=self.getGroup, meta={‘title‘: title,‘groupUrl‘:groupUrl}, dont_filter=True)            yield request    def getGroup(self, response):        maxIndex = response.xpath("//div[@class=‘pagenavi‘]//span/text()").extract()[-2]        for index in range(1, int(maxIndex) + 1):            pageUrl = response.meta[‘groupUrl‘]+‘/‘+str(index)            meta = response.meta            meta[‘index‘] = index            request = scrapy.Request(url=pageUrl, callback=self.getPage, meta=meta, dont_filter=True)            yield request    def getPage(self, response):        imageurl = response.xpath("//div[@class=‘main-image‘]//img/@src").extract()[0]  # 擷取圖片url        request = scrapy.Request(url=imageurl, callback=self.FormItem, meta=response.meta,dont_filter=True)        yield request    def FormItem(self, response):        title = response.meta[‘title‘]        index = response.meta[‘index‘]        image = response.body        item = GirlpicItem(title=title,index=index,image=image)        yield item

 

 

PipeLine:

# -*- coding: utf-8 -*-# Define your item pipelines here## Don‘t forget to add your pipeline to the ITEM_PIPELINES setting# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport osimport codecsimport sysreload(sys)sys.setdefaultencoding(‘utf-8‘)class GirlpicPipeline(object):    def __init__(self):        self.dirpath = u‘D:\學習資料‘        if not os.path.exists(self.dirpath):            os.makedirs(self.dirpath)    def process_item(self, item, spider):        title = item[‘title‘]        index = item[‘index‘]        image = item[‘image‘]        groupdir = os.path.join(self.dirpath, title)        if not os.path.exists(groupdir):            os.makedirs(groupdir)        imagepath = os.path.join(groupdir, str(index) + u‘.jpg‘)        file = codecs.open(imagepath, ‘wb‘)        file.write(image)        file.close()        return item

 

Python爬蟲 —— 抓取美女圖片(Scrapy篇)

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.