No. 342, Python distributed crawler build search engine Scrapy explaining-crawler data save
Note: The operation of data saving is done in the pipelines.py file.
Save data as a JSON file
Spider is a signal detection
#-*-coding:utf-8-*-#Define your item pipelines here##Don ' t forget to add your pipeline to the Item_pipelines setting#see:http://doc.scrapy.org/en/latest/topics/item-pipeline.html fromScrapy.pipelines.imagesImportImagespipeline#Import Picture Downloader moduleImportCodecsImportJSONclassAdcpipeline (object):#defines a data processing class that must inherit object def __init__(self): self.file = Codecs.open ('shuju.json', 'w', encoding=' utf-8') #Open JSON file when initializing defProcess_item (Self,Item, spider):#Process_item (item) is a data-processing function that receives a item,item that is the reptile's last yield item . #print (' article title: ' + item[' title '][0]) #print (' article thumbnail URL is: ' + item[' img '][0]) #print (' article Thumbnail save path: ' + item[' img_tplj ') ' #接收图片下载器填充的, the path after the picture is downloaded #save data as a JSON file lines = Json.dumps (Dict (item), ensure_ascii=false) + '\ n' #convert data Objects to JSON format Self.file.write (lines) #writing JSON-formatted data to a file return itemdef spider_closed (self,spider): # Create a method to inherit the spider, The spider is a signal that triggers this method when the current data operation is completed Self.file.close () # Close Open file classImgpipeline (Imagespipeline):#Customize a picture download Within, inherit crapy built-in imagespipeline Picture Downloader class defItem_completed (self, results, item, info):#Use the item_completed () method in the Imagespipeline class to get the save path after the picture is downloaded forOK, valueinchResults:img_lj= value['Path']#Receive picture save path #print (OK)item['IMG_TPLJ'] = Img_lj#fill the picture save path into the field in items.py returnItem#The container function to give the item to the items.py file #Note: When the custom Picture downloader is set up, you need to
No. 342, Python distributed crawler build search engine Scrapy explaining-crawler data save