Scrapy Global Command
To understand which global commands are in Scrapy, you can run without entering the Scrapy Crawler project directory Scrapy-h
(1) Fetch command
The FETCH command is used primarily to display the crawler crawl process, and if used outside of the Scrapy project directory, the scrapy default crawler is invoked to crawl the page, and if the command is used within a project directory in Scrapy, the crawler in the project is called to crawl the page
--headers controlling the Display object's crawler crawl site header information
--nolog control does not display log information
--logfile==file Storing Log text information
--spider=spider control which crawler to use
--loglevel=level Control Log Level
Log rank common values:
CRITICAL a serious error occurred
Error has occurred that must be handled immediately
WARNING some warning messages appear
Info OUTPUT some hint information
Debug output Some debugging information, often used in the development phase
(2) Runspider command
Can be implemented without relying on scrapy crawler projects, directly run a crawler file
The command is not yet understood, and I don't see the print information for the parse () function
(3) Setting command
View the configuration information for the scrapy, if used within the project directory, to view the configuration information for the corresponding project, and to view the scrapy default configuration information if used outside of the project
(4) Shell command
Shell command can start scrapy Interactive terminal, Scrapy's interactive terminal is often used in the development and diving, using Scrapy Interactive terminal can be implemented without starting the Scrapy crawler, the site response to debug
You can see that you can use the Scarpy object and the shortcut command after executing the command
(5) Startproject command
Used to create a project
Scrapy startproject Firstspider [Parm]
(6) Version command
The version command can be used to directly display information about Scrapy
(7) View command
Implements the ability to download a webpage and view it in a browser
Scrapy Project Command (1) Bench command
Use the bench command to test the performance of the local hardware, when we allow scrapy bench, will create a local server and will crawl at the maximum speed, again in order to test the performance of local hardware, to avoid the impact of too many factors, all only connected follow-up, not content processing
Purely on the hardware performance, the display can crawl about 2,400 pages per minute, this is a reference standard, in the actual operation of crawler projects, due to various factors caused by different speed, in general, according to the actual speed of operation and the reference speed comparison results, so that the Crawler project optimization and improvement
(2) Genspider command
To create a crawler file, you can use the-l parameter of the command to view the currently available crawler templates
Use-t to generate a crawler based on any of these crawler templates
This will generate the country_test.py file in the Example/spiders/country_test directory
(3) Check command
Use the check command in Scrapy to make a contract (contract) check on a crawler file
(4) Crawl command
Start a crawler
Scrapy Crawl Country_test--loglevel=debug
(5) List command
List the currently available crawler files
(6) Edit command
Edit the crawler file directly by opening the corresponding editor
Python web crawler scrapy common commands