Before using Scrapy to crawl data, the default is to determine in logic whether to perform the next request
def Parse (self): # get all URLs, such as get to URLs for inch URLs: yield Request (URL)
Like what:
defParse (self,response): Item=Movieitem () selector=Selector (response) Movies= Selector.xpath ('//div[@class = "Info"]') forEachmoiveinchMovies:title= Eachmoive.xpath ('div[@class = "HD"]/a/span/text ()'). Extract () Star= Eachmoive.xpath ('div[@class = "BD"]/div[@class = "star"]/span/em/text ()'). Extract () [0] Quote= Eachmoive.xpath ('div[@class = "BD"]/p[@class = "quote"]/span/text ()'). Extract () NextLink= Selector.xpath ('//span[@class = "Next"]/link/@href'). Extract ()#下one page ifNextlink:nextlink=Nextlink[0]yieldRequest (Self.url + nextlink,callback=self.parse)
No intention to view the official documents of Scrapy today, you can use the Start_requests () method to cycle through the URLs you want to crawl
def start_requests (self): urls=[] for in range (1,10): url= ' http://www.test.com/?page=%s '%i page=scrapy. Request (URL) urls.append (page) return URLs
Use Python must be simple and rough, so I put the previous code in the following way
#Start URLStart_urls = [ "http://q.stock.sohu.com" ] #to define a crawl URL defstart_requests (self):#by day return[Request ("http://q.stock.sohu.com/hishq?code=cn_{0}"+"&start="+ Self.begin_date +"&end="+ Self.end_date +"&stat=1&order=d&period=d&rt=json&r=0.6618998353094041&0.8423532517054869"). Format (x['Code'])) forXinchSelf.stock_basics]
Note: to be aware of this method of overriding start_requests, you do not need to set Start_urls
, and write start_urls
is useless
This method must return a iterable with the first requests to crawl for this spider. this is the method called by Scrapy when the spider is opened for scraping Span style= "color: #0000ff;" >when No particular URLs are specified. if particular URLs are specified, the Make_requests_from_url () was used instead to CR Eate the requests. This method was also called only once from Scrapy, so it's safe to implement it as a Generator.the default implementation uses Make_requests_from_url () to generate requests For each URL in start_urls.
REFER:
Http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests
Python crawler----(Scrapy framework Improved (1), custom request crawl)
https://my.oschina.net/lpe234/blog/342741
Customizing the requests of the Scrapy crawler