scrapy Source Code Analysis series -4 Scrapy.commands sub-package
The sub-package scrapy.commands defines the subcommands used in command scrapy (subcommand): bench, check, crawl, deploy, edit, Fetch,
Genspider, list, parse, Runspider, settings, Shell, Startproject, version, view. All sub-command modules define an inheritance from the
The subclass of Class Scrapycommand command.
First of all, a sudden command crawl, which is used to start the spider.
1. crawl.py
The focus of attention is on the method run (self, args, opts):
1 defrun (self, args, opts):2 ifLen (args) < 1:3 RaiseUsageerror ()4 elifLen (args) > 1:5 RaiseUsageerror ("running ' scrapy crawl ' with more than one spider is no longer supported")6Spname =Args[0]7 8Crawler =Self.crawler_process.create_crawler () # A 9Spider = Crawler.spiders.create (spname, * *Opts.spargs) # B Ten Crawler.crawl (spider) # C OneSelf.crawler_process.start ()# D
So the question is, where does the run interface method call from? Let's go back to python.scrapy.11-scrapy-source-code-analysis-part-1.
The description of "1.2 cmdline.py command.py" about "_run_print_help ()".
A: Create class Crawler object crawler. When you create a crawler object, the instance properties of the Crawler object Spiders (Spidermanager) are created. As shown below:
1 classCrawler (object):2 3 def __init__(self, settings):4self.configured =False5Self.settings =Settings6Self.signals =Signalmanager (self)7Self.stats = Load_object (settings['Stats_class']) (self)8Self._start_requests =Lambda: ()9Self._spider =NoneTen #todo:move Spidermanager to Crawlerprocess One spman_cls = load_object (self.settings['spider_manager_class') AThe type of self.spiders = Spman_cls.from_crawler (self) # spiders is: Spidermanager
The Crawler object corresponds to a Spidermanager object, and the Spidermanager object manages multiple spiders.
B: Gets the Sipder object.
C: Installs the Crawler object for the Spider object. (crawler for spiders installed)
D: The start () method for class crawlerprocess is as follows:
1 def start(self):2 ifself.start_crawling ():3 Self.start_reactor ()4 5 def start_crawling(self):6 log.scrapy_info (self.settings)7 returnSelf._start_crawler () is notNone8 9 def start_reactor(self):Ten ifSelf.settings.getbool ('dnscache_enabled'): One Reactor.installresolver (Cachingthreadedresolver (reactor)) AReactor.addsystemeventtrigger ('before','shutdown', Self.stop) -Reactor.run (Installsignalhandlers=false)#Blocking Call - the def _start_crawler(self): - if notSelf.crawlersorself.stopping: - return - +Name, crawler =Self.crawlers.popitem () -Self._active_crawler =crawler +Sflo =Log.start_from_crawler (crawler) A crawler.configure () at Crawler.install () - Crawler.signals.connect (Crawler.uninstall, signals. engine_stopped) - ifSflo: - Crawler.signals.connect (Sflo.stop, signals. engine_stopped) - Crawler.signals.connect (Self._check_done, signals. engine_stopped) - Crawler.start () # Call the Start () method of class crawler in returnName, crawler
The start () method for class crawler is as follows:
1 def Start (self): 2 yield defer.maybedeferred (self.configure) 3 if Self._spider: 4 yield Self . engine # contact with engine (executionengine) 5 yield defer.maybedeferred (Self.engine.start)
About class Executionengine will be involved in the Scrapy.core analysis of the child package .
2. startproject.py
3. How the subcommand is loaded
In the Cmdline.py method, execute () has the following lines of code:
1 Inproject = inside_project ()2 cmds = _get_commands_dict(settings, Inproject) 3 _pop_command_name(argv)
_get_commands_dict ():
1 def_get_commands_dict (Settings, inproject):2Cmds =_get_commands_from_module('Scrapy.commands', Inproject)3 cmds.update (_get_commands_from_entry_points (inproject))4Cmds_module = settings['Commands_module']5 ifCmds_module:6 cmds.update (_get_commands_from_module (Cmds_module, inproject))7 returnCmds
_get_commands_from_module ():
1 def_get_commands_from_module (module, inproject):2D = {}3 forCmdinch _iter_command_classes(module):4 ifInprojector notCmd.requires_project:5cmdname = cmd.__module__. Split ('.') [-1]6D[cmdname] =cmd ()7 returnD
To be Continued
Next, parse the settings related logic. Python.scrapy.15-scrapy-source-code-analysis-part-5
Python.scrapy.14-scrapy-source-code-analysis-part-4