Python web crawler scrapy common commands

Source: Internet
Author: User
Tags python web crawler

Scrapy Global Command

To understand which global commands are in Scrapy, you can run without entering the Scrapy Crawler project directory Scrapy-h

  

(1) Fetch command

The FETCH command is used primarily to display the crawler crawl process, and if used outside of the Scrapy project directory, the scrapy default crawler is invoked to crawl the page, and if the command is used within a project directory in Scrapy, the crawler in the project is called to crawl the page

--headers controlling the Display object's crawler crawl site header information

--nolog control does not display log information

--logfile==file Storing Log text information

--spider=spider control which crawler to use

--loglevel=level Control Log Level

Log rank common values:

CRITICAL a serious error occurred

Error has occurred that must be handled immediately

WARNING some warning messages appear

Info OUTPUT some hint information

Debug output Some debugging information, often used in the development phase

(2) Runspider command

Can be implemented without relying on scrapy crawler projects, directly run a crawler file

The command is not yet understood, and I don't see the print information for the parse () function

(3) Setting command

View the configuration information for the scrapy, if used within the project directory, to view the configuration information for the corresponding project, and to view the scrapy default configuration information if used outside of the project

(4) Shell command

Shell command can start scrapy Interactive terminal, Scrapy's interactive terminal is often used in the development and diving, using Scrapy Interactive terminal can be implemented without starting the Scrapy crawler, the site response to debug

  

You can see that you can use the Scarpy object and the shortcut command after executing the command

(5) Startproject command

Used to create a project

Scrapy startproject Firstspider [Parm]

(6) Version command

The version command can be used to directly display information about Scrapy

(7) View command

Implements the ability to download a webpage and view it in a browser

Scrapy Project Command (1) Bench command

Use the bench command to test the performance of the local hardware, when we allow scrapy bench, will create a local server and will crawl at the maximum speed, again in order to test the performance of local hardware, to avoid the impact of too many factors, all only connected follow-up, not content processing

Purely on the hardware performance, the display can crawl about 2,400 pages per minute, this is a reference standard, in the actual operation of crawler projects, due to various factors caused by different speed, in general, according to the actual speed of operation and the reference speed comparison results, so that the Crawler project optimization and improvement

(2) Genspider command

To create a crawler file, you can use the-l parameter of the command to view the currently available crawler templates

  

Use-t to generate a crawler based on any of these crawler templates

  

This will generate the country_test.py file in the Example/spiders/country_test directory

  

(3) Check command

Use the check command in Scrapy to make a contract (contract) check on a crawler file

  

(4) Crawl command

Start a crawler

Scrapy Crawl Country_test--loglevel=debug

(5) List command

List the currently available crawler files

(6) Edit command

Edit the crawler file directly by opening the corresponding editor

Python web crawler scrapy common commands

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.