Python.scrapy.14-scrapy-source-code-analysis-part-4

Source: Internet
Author: User

scrapy Source Code Analysis series -4 Scrapy.commands sub-package

The sub-package scrapy.commands defines the subcommands used in command scrapy (subcommand): bench, check, crawl, deploy, edit, Fetch,

Genspider, list, parse, Runspider, settings, Shell, Startproject, version, view. All sub-command modules define an inheritance from the

The subclass of Class Scrapycommand command.

First of all, a sudden command crawl, which is used to start the spider.

1. crawl.py

The focus of attention is on the method run (self, args, opts):

1 defrun (self, args, opts):2         ifLen (args) < 1:3             RaiseUsageerror ()4         elifLen (args) > 1:5             RaiseUsageerror ("running ' scrapy crawl ' with more than one spider is no longer supported")6Spname =Args[0]7 8Crawler =Self.crawler_process.create_crawler () # A 9Spider = Crawler.spiders.create (spname, * *Opts.spargs) # B Ten Crawler.crawl (spider) # C  OneSelf.crawler_process.start ()# D

So the question is, where does the run interface method call from? Let's go back to python.scrapy.11-scrapy-source-code-analysis-part-1.

The description of "1.2 cmdline.py command.py" about "_run_print_help ()".

A: Create class Crawler object crawler. When you create a crawler object, the instance properties of the Crawler object Spiders (Spidermanager) are created. As shown below:

1 classCrawler (object):2 3     def __init__(self, settings):4self.configured =False5Self.settings =Settings6Self.signals =Signalmanager (self)7Self.stats = Load_object (settings['Stats_class']) (self)8Self._start_requests =Lambda: ()9Self._spider =NoneTen         #todo:move Spidermanager to Crawlerprocess One         spman_cls = load_object (self.settings['spider_manager_class')  AThe type of self.spiders = Spman_cls.from_crawler (self) # spiders is: Spidermanager

The Crawler object corresponds to a Spidermanager object, and the Spidermanager object manages multiple spiders.

B: Gets the Sipder object.

C: Installs the Crawler object for the Spider object. (crawler for spiders installed)

D: The start () method for class crawlerprocess is as follows:

1     def start(self):2         ifself.start_crawling ():3 Self.start_reactor ()4 5     def start_crawling(self):6 log.scrapy_info (self.settings)7         returnSelf._start_crawler () is  notNone8 9     def start_reactor(self):Ten         ifSelf.settings.getbool ('dnscache_enabled'): One Reactor.installresolver (Cachingthreadedresolver (reactor)) AReactor.addsystemeventtrigger ('before','shutdown', Self.stop) -Reactor.run (Installsignalhandlers=false)#Blocking Call -  the     def _start_crawler(self): -         if  notSelf.crawlersorself.stopping: -             return -  +Name, crawler =Self.crawlers.popitem () -Self._active_crawler =crawler +Sflo =Log.start_from_crawler (crawler) A crawler.configure () at Crawler.install () - Crawler.signals.connect (Crawler.uninstall, signals.  engine_stopped) -         ifSflo: - Crawler.signals.connect (Sflo.stop, signals.  engine_stopped) - Crawler.signals.connect (Self._check_done, signals.  engine_stopped) - Crawler.start () # Call the Start () method of class crawler in         returnName, crawler

The start () method for class crawler is as follows:

1     def Start (self): 2         yield defer.maybedeferred (self.configure) 3         if Self._spider: 4             yield Self . engine # contact with engine (executionengine)  5         yield defer.maybedeferred (Self.engine.start)

About class Executionengine will be involved in the Scrapy.core analysis of the child package .

2. startproject.py

3. How the subcommand is loaded

In the Cmdline.py method, execute () has the following lines of code:

1     Inproject = inside_project ()2     cmds =  _get_commands_dict(settings, Inproject)  3     _pop_command_name(argv)

_get_commands_dict ():

1 def_get_commands_dict (Settings, inproject):2Cmds =_get_commands_from_module('Scrapy.commands', Inproject)3 cmds.update (_get_commands_from_entry_points (inproject))4Cmds_module = settings['Commands_module']5     ifCmds_module:6 cmds.update (_get_commands_from_module (Cmds_module, inproject))7     returnCmds

_get_commands_from_module ():

1 def_get_commands_from_module (module, inproject):2D = {}3      forCmdinch _iter_command_classes(module):4         ifInprojector  notCmd.requires_project:5cmdname = cmd.__module__. Split ('.') [-1]6D[cmdname] =cmd ()7     returnD

To be Continued

Next, parse the settings related logic. Python.scrapy.15-scrapy-source-code-analysis-part-5

Python.scrapy.14-scrapy-source-code-analysis-part-4

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.