Python爬蟲 —— 抓取美女圖片（Scrapy篇）

最後更新：2018-06-29 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：efault ini ide 學習資料 proc href rap ack ESS

雜談：

之前用requests模組爬取了美女圖片，今天用scrapy架構實現了一遍。

（圖片尺度確實大了點，但老衲早已無戀紅塵，權當觀賞哈哈哈）

Item:

# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# https://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass GirlpicItem(scrapy.Item):    title = scrapy.Field()    image = scrapy.Field()    index = scrapy.Field()

Spider:

#coding:utf-8from scrapy.spiders import Spiderfrom scrapy.http import Requestfrom scrapy.selector import Selectorfrom girlpic.items import GirlpicItemimport scrapyimport sysreload(sys)sys.setdefaultencoding(‘utf-8‘)class GirlpicSipder(Spider):    name = ‘girlpic‘    allowed_domains = []  # 允許的網域名    start_urls = ["http://www.mzitu.com/all/"]    def parse(self, response):        groups = response.xpath("//div[@class=‘main-content‘]//ul[@class=‘archives‘]//a")        count = 0        for group in groups:            count = count + 1            if count > 5:                return   #此處小心，不要用os.exit(0)            groupUrl = group.xpath(‘@href‘).extract()[0]            title = group.xpath("text()").extract()[0]            request = scrapy.Request(url=groupUrl, callback=self.getGroup, meta={‘title‘: title,‘groupUrl‘:groupUrl}, dont_filter=True)            yield request    def getGroup(self, response):        maxIndex = response.xpath("//div[@class=‘pagenavi‘]//span/text()").extract()[-2]        for index in range(1, int(maxIndex) + 1):            pageUrl = response.meta[‘groupUrl‘]+‘/‘+str(index)            meta = response.meta            meta[‘index‘] = index            request = scrapy.Request(url=pageUrl, callback=self.getPage, meta=meta, dont_filter=True)            yield request    def getPage(self, response):        imageurl = response.xpath("//div[@class=‘main-image‘]//img/@src").extract()[0]  # 擷取圖片url        request = scrapy.Request(url=imageurl, callback=self.FormItem, meta=response.meta,dont_filter=True)        yield request    def FormItem(self, response):        title = response.meta[‘title‘]        index = response.meta[‘index‘]        image = response.body        item = GirlpicItem(title=title,index=index,image=image)        yield item

PipeLine:

# -*- coding: utf-8 -*-# Define your item pipelines here## Don‘t forget to add your pipeline to the ITEM_PIPELINES setting# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport osimport codecsimport sysreload(sys)sys.setdefaultencoding(‘utf-8‘)class GirlpicPipeline(object):    def __init__(self):        self.dirpath = u‘D:\學習資料‘        if not os.path.exists(self.dirpath):            os.makedirs(self.dirpath)    def process_item(self, item, spider):        title = item[‘title‘]        index = item[‘index‘]        image = item[‘image‘]        groupdir = os.path.join(self.dirpath, title)        if not os.path.exists(groupdir):            os.makedirs(groupdir)        imagepath = os.path.join(groupdir, str(index) + u‘.jpg‘)        file = codecs.open(imagepath, ‘wb‘)        file.write(image)        file.close()        return item

Python爬蟲 —— 抓取美女圖片（Scrapy篇）

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More