Learning Scrapy notes (7)-Scrapy runs multiple crawlers Based on Excel files, and learningscrapy

Source: Internet
Author: User

Learning Scrapy notes (7)-Scrapy runs multiple crawlers Based on Excel files, and learningscrapy

Abstract: run multiple crawlers Based on the Excel file configuration

Many times, we need to write a crawler for each individual website, but in some cases, the only difference between the websites you want to crawl is that the Xpath expressions are different, at this time, it is futile to write a crawler for each website. In fact, you can use only one spider to crawl these similar websites.

First, create a project named generic and a spider named fromcsv:

scrapy startproject genericcd genericscrapy genspider fromcsv example.com

Create a csv file and fill in the following information in the file:

$ Python >>>> import csv >>> with open ("todo.csv", "rU") as f: reader = csv. DictReader (f) for line in reader: print line

The output is as follows:

Import csvimport scrapyfrom scrapy. http import Requestfrom scrapy. loader import ItemLoaderfrom scrapy. item import Item, Fieldclass FromcsvSpider (scrapy. spider): name = "fromcsv" def start_requests (self): with open ("todo.csv", "rU") as f: reader = csv. dictReader (f) for line in reader: request = Request (line. pop ('url') # The request element whose key is url is displayed in the dictionary. meta ['fields'] = line yield requestdef parse (self, response): item = Item () # items is not defined in this project. py file l = ItemLoader (item = item, response = response) for name, xpath in response. meta ['fields']. iteritems (): if xpath: item. fields [name] = Field () # dynamically create an item l. add_xpath (name, xpath) return l. load_item ()

 

Fromcsv. py Source File Code address:

Https://github.com/Kylinlin/scrapybook/blob/master/ch05%2Fgeneric%2Fgeneric%2Fspiders%2Ffromcsv.py

 

Run spider: scrapy crawl fromcsv

With open (getarrt (self, "file", export todo.csv ")," rU ") as f:

Then, when you run the spiderfile, you can use the -aexample to specify the csvfile (if the -aexample is not used, use the todo.csv file ):

scrapy crawl fromcsv –a file=todo.csv

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.