Python web crawler scrapy common commands

Last Update:2018-03-10 Source: Internet

Author: User

Tags python web crawler

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scrapy Global Command

To understand which global commands are in Scrapy, you can run without entering the Scrapy Crawler project directory Scrapy-h

(1) Fetch command

The FETCH command is used primarily to display the crawler crawl process, and if used outside of the Scrapy project directory, the scrapy default crawler is invoked to crawl the page, and if the command is used within a project directory in Scrapy, the crawler in the project is called to crawl the page

--headers controlling the Display object's crawler crawl site header information

--nolog control does not display log information

--logfile==file Storing Log text information

--spider=spider control which crawler to use

--loglevel=level Control Log Level

Log rank common values:

CRITICAL a serious error occurred

Error has occurred that must be handled immediately

WARNING some warning messages appear

Info OUTPUT some hint information

Debug output Some debugging information, often used in the development phase

(2) Runspider command

Can be implemented without relying on scrapy crawler projects, directly run a crawler file

The command is not yet understood, and I don't see the print information for the parse () function

(3) Setting command

View the configuration information for the scrapy, if used within the project directory, to view the configuration information for the corresponding project, and to view the scrapy default configuration information if used outside of the project

(4) Shell command

Shell command can start scrapy Interactive terminal, Scrapy's interactive terminal is often used in the development and diving, using Scrapy Interactive terminal can be implemented without starting the Scrapy crawler, the site response to debug

You can see that you can use the Scarpy object and the shortcut command after executing the command

(5) Startproject command

Used to create a project

Scrapy startproject Firstspider [Parm]

(6) Version command

The version command can be used to directly display information about Scrapy

(7) View command

Implements the ability to download a webpage and view it in a browser

Scrapy Project Command (1) Bench command

Use the bench command to test the performance of the local hardware, when we allow scrapy bench, will create a local server and will crawl at the maximum speed, again in order to test the performance of local hardware, to avoid the impact of too many factors, all only connected follow-up, not content processing

Purely on the hardware performance, the display can crawl about 2,400 pages per minute, this is a reference standard, in the actual operation of crawler projects, due to various factors caused by different speed, in general, according to the actual speed of operation and the reference speed comparison results, so that the Crawler project optimization and improvement

(2) Genspider command

To create a crawler file, you can use the-l parameter of the command to view the currently available crawler templates

Use-t to generate a crawler based on any of these crawler templates

This will generate the country_test.py file in the Example/spiders/country_test directory

(3) Check command

Use the check command in Scrapy to make a contract (contract) check on a crawler file

(4) Crawl command

Start a crawler

Scrapy Crawl Country_test--loglevel=debug

(5) List command

List the currently available crawler files

(6) Edit command

Edit the crawler file directly by opening the corresponding editor

Python web crawler scrapy common commands

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More