I have just studied a new skill, and I feel that my level has risen to a level. Is the problem of cross-page item fetching. I've never understood it before. The code is as follows!
The item declares as follows:
Import scrapy class Quotesitem (scrapy. Item): = scrapy. Field () = scrapy. Field () = scrapy. Field () = scrapy. Field () = scrapy. Field () = scrapy. Field () = scrapy. Field ()
spider.py as follows
Importscrapy fromQuotes_2.itemsImportQuotesitemclassQuotesspider (scrapy. Spider): Name='Quotes_2_6'Start_urls= [ 'http://quotes.toscrape.com',] allowed_domains= [ 'toscrape.com', ] defParse (self,response): forQuoteinchRESPONSE.CSS ('Div.quote'): Item=Quotesitem () item['Quote'] = Quote.css ('Span.text::text'). Extract_first () item['author'] = Quote.css ('Small.author::text'). Extract_first () item['Tags'] = Quote.css ('div.tags A.tag::text'). Extract () Author_page= Response.css ('small.author+a::attr (HREF)'). Extract_first () item['Author_full_url'] =Response.urljoin (author_page)yieldScrapy. Request (url=item['Authro_full_url'], meta={'Item': item},callback=self.parse_author,dont_filter=True) Next_page= Response.css ('li.next a::attr ("href")'). Extract_first ()ifNext_page is notNone:next_full_url=Response.urljoin (next_page)yieldScrapy. Request (Next_full_url, callback=self.parse)defParse_author (self,response): Item= response.meta['Item'] item['author_born_date'] = Response.css ('. Author-born-date::text'). Extract_first () item['author_born_location'] = Response.css ('. Author-born-location::text'). Extract_first () item['author_description'] = Response.css ('. Author-born-location::text'). Extract_first ()yieldItem
"" Through the meta parameter, assign the item dictionary to the ' item ' key in Meta (remember that meta itself is also a dictionary).
Scrapy.request request URL into a "Request object", this meta dictionary (contains key value ' key ', ' key ' value is also a dictionary, that is, item)
is sent to the Parse2 () function "" in the Request object.
item = response.meta[' item '] # "" " This response already contains the meta-dictionary, which assigns this dictionary to item,
dont_filter= True to turn off again.
It's been almost one months. Python has made little headway. Now study the Scrapy project.