Learning Scrapy notes (7)-Scrapy runs multiple crawlers Based on Excel files, and learningscrapy
Abstract: run multiple crawlers Based on the Excel file configuration
Many times, we need to write a crawler for each individual website, but in some cases, the only difference between the websites you want to crawl is that the Xpath expressions are different, at this time, it is futile to write a crawler for each website. In fact, you can use only one spider to crawl these similar websites.
First, create a project named generic and a spider named fromcsv:
scrapy startproject genericcd genericscrapy genspider fromcsv example.com
Create a csv file and fill in the following information in the file:
$ Python >>>> import csv >>> with open ("todo.csv", "rU") as f: reader = csv. DictReader (f) for line in reader: print line
The output is as follows:
Import csvimport scrapyfrom scrapy. http import Requestfrom scrapy. loader import ItemLoaderfrom scrapy. item import Item, Fieldclass FromcsvSpider (scrapy. spider): name = "fromcsv" def start_requests (self): with open ("todo.csv", "rU") as f: reader = csv. dictReader (f) for line in reader: request = Request (line. pop ('url') # The request element whose key is url is displayed in the dictionary. meta ['fields'] = line yield requestdef parse (self, response): item = Item () # items is not defined in this project. py file l = ItemLoader (item = item, response = response) for name, xpath in response. meta ['fields']. iteritems (): if xpath: item. fields [name] = Field () # dynamically create an item l. add_xpath (name, xpath) return l. load_item ()
Fromcsv. py Source File Code address:
Https://github.com/Kylinlin/scrapybook/blob/master/ch05%2Fgeneric%2Fgeneric%2Fspiders%2Ffromcsv.py
Run spider: scrapy crawl fromcsv
With open (getarrt (self, "file", export todo.csv ")," rU ") as f:
Then, when you run the spiderfile, you can use the -aexample to specify the csvfile (if the -aexample is not used, use the todo.csv file ):
scrapy crawl fromcsv –a file=todo.csv