Mobile app grab crawler 1. items.py
class DouyuspiderItem(scrapy.Item): name = scrapy.Field()# 存储照片的名字 imagesUrls = scrapy.Field()# 照片的url路径 imagesPath = scrapy.Field()# 照片保存在本地的路径
2. spiders/douyu.py
Import ScrapyImport JSONFrom Douyuspider.itemsImport DouyuspideritemClassDouyuspider(Scrapy. Spider): name ="Douyu" allowd_domains = ["http://capi.douyucdn.cn"] offset = 0 url = "Http://capi.douyucdn.cn/api/v1/getVerticalRoom?limit =20&offset= "start_urls = [url + str (offset)] def parse (self, response): # return from JSON Data-segment DataSet Json.loads (response.text) [ "data"] for each Span class= "Hljs-keyword" >in Data:item = Douyuspideritem () item[ "name"] = Each[" nickname "] Item[" imagesurls "] = Each[ Vertical_ SRC "] yield item Self.offset + 20 Yield scrapy. Request (Self.url + str (self.offset), callback = self.parse)
3. Set setting.py
ITEM_PIPELINES = {‘douyuSpider.pipelines.ImagesPipeline‘: 1}# Images 的存放位置,之后会在pipelines.py里调用IMAGES_STORE = "/Users/Power/lesson_python/douyuSpider/Images"# user-agentUSER_AGENT = ‘DYZB/2.290 (iPhone; iOS 9.3.4; Scale/2.00)‘
4. pipelines.py
Import ScrapyImport OSFrom Scrapy.pipelines.imagesImport ImagespipelineFrom Scrapy.utils.projectImport Get_project_settingsClassImagespipeline(imagespipeline): Images_store = Get_project_settings (). Get ("Images_store")DefGet_media_requests(Self, item, info): Image_url = item["Imagesurls"]Yield scrapy. Request (Image_url)Defitem_completed(Self, results, item, info):# fixed writing, get the picture path, at the same time to determine whether the path is correct, if correct, put in Image_path, imagespipeline source analysis Visible image_path = [x[< Span class= "hljs-string" > "path"] for OK, x in results if OK] os.rename (self. Images_store + "/" + Image_path[0], self. Images_store + "/" + Item[ "name"] + . JPG ") item[" Imagespath "] = self. Images_store + "/" + Item[ "name"] return Item #get_media_requests的作用就是为每一个图片链接生成一个Request对象, the output of this method will be used as Item_ The results,results in the input of completed is a tuple, each tuple includes (success, Imageinfoorfailure). If Success=true,imageinfoor_failure is a dictionary, including url/path/checksum three keys.
Create a new main.py file under the project root for debugging
from scrapy import cmdlinecmdline.execute(‘scrapy crawl douyu‘.split())
Execute the program
py2 main.py
Crawler Frame Scrapy's mobile phone grab bag case