[Scrapy] [Go] about scrapy command

Source: Internet
Author: User

Scrapy provides two types of commands. A command that must be run in a scrapy project ( for a project (project-specific) ), and one that is not required ( Global command ). Global commands may behave differently when running in a project than in a non-project (because the project's settings may be used).

Global command:

    • startproject
    • settings
    • runspider
    • shell
    • fetch
    • view
    • version

Project (project-only) command:

    • crawl
    • check
    • list
    • edit
    • parse
    • genspider
    • deploy
    • bench
1.startproject
    • Grammar:scrapy startproject <project_name>
    • Item Required: No

Under the Project_Name folder, create a scrapy project named Project_Name.

Example:

$ scrapy Startproject MyProject
View Code2.genspider
    • Grammar:scrapy genspider [-t template] <name> <domain>
    • Project Required: Yes
    • Parameters
      • -l list available templates
      • -e Edit after creation is complete
      • -D in standard output display
      • -T using User templates

Creates a spider in the current project.

This is just a quick way to create a spider. This method can be used to create a spider using a template that is defined in advance. You can also create your own spider's source files.

Example:

$ scrapy Genspider-lavailable templates:basic Crawl csvfeed xmlfeed$ scrapy genspider-d basicimport scrapyclass $classname (scrapy. Spider): Name="$name"Allowed_domains= ["$domain"] Start_urls= (        'http://www. $domain/',) def parse (self, Response): pass$ scrapy genspider-T Basic example example.comcreated spider'Example'Using template'Basic' inchModule:mybot.spiders.example
View Code

 
3.crawl
    • Grammar:scrapy crawl  [options] <spider>
    • Project Required: Yes
    • Parameters
      • -A set spider argument
      • -o file output results to files
      • -t format to determine-o formatting

Use spiders for crawling.

Example:

$ scrapy Crawl myspider[... myspider starts crawling ...]
View Code

 
4.check
    • Grammar:scrapy check [options] <spider>
    • Project Required: Yes
    • Parameters
      • -L just list contracts, do not check
      • -V Prints all Spiders contract test results

Run the contract check. So, what's the name contract?

Example:

$ scrapy Check-lfirst_spider  * parse  * parse_itemsecond_spider  * Parse   * parse_item$ scrapy check[failed] first_spider:parse_item'Retailpricex ' field is missing[failed] First_spider:parse  the 0.. 4
View Code

 
5.list
    • Grammar:scrapy list
    • Project Required: Yes

Lists all the available spiders in the current project. Each row outputs a spider.

Examples of Use:

$ scrapy Listspider1spider2
View Code

 
6.edit
    • Grammar:scrapy edit <spider>
    • Project Required: Yes

Edit the given spider with the editor set in the editor

The command simply provides a shortcut. Developers are free to choose other tools or ides to write debug spiders.

7.fetch
    • Grammar:scrapy fetch [options] <url>
    • Item Required: No
    • Parameters
      • --spider=spider Use this reptile.
      • --headers prints the headers of the URL instead of its body

Download the given URL using the Scrapy Downloader (downloader) and send the captured content to standard output.

This command gets the page in the way the Spider downloads the page. For example, if the spider has USER_AGENT properties that modify the User Agent, the command will use that property.

Therefore, you can use this command to see how the spider obtains a particular page.

This command will use the default Scrapy downloader setting if it is running in a non-project.

Example:

$ scrapy Fetch--nolog http://www.example.com/some/page.html[... HTML content here ...] $ scrapy Fetch--nolog--headers http://www.example.com/{'accept-ranges': ['bytes'], ' Age': ['1263'], 'Connection': ['Close'], 'Content-length': ['596'], 'Content-type': ['text/html; Charset=utf-8'], 'Date': ['Wed, 23:59:46 GMT'], 'Etag': ['"573c1-254-48c9c87349680"'], 'last-modified': ['Fri, 15:30:18 GMT'], 'Server': ['apache/2.2.3 (CentOS)']}
View Code

 
8.view
    • Grammar:scrapy view [options] <url>
    • Item Required: No
    • Parameters
      • --spider=spider Use this reptile.

Opens the given URL in the browser and displays it in the form acquired by the Scrapy Spider. Sometimes spiders get pages that are not the same as ordinary users see. So this command can be used to check the page the spider gets to and confirm that this is what you expect.

Example:

$ scrapy View http://www.example.com/some/page.html[... browser starts ...]
View Code

 
9.shell
    • Grammar:scrapy shell [url|file]
    • Item Required: No

Launches the Scrapy shell with the given URL (if given) or empty (no URL given). Check out the Scrapy terminal (Scrapy shell) for more information.

Example:

$ scrapy Shell http://www.example.com/some/page.html[... scrapy shell starts ...]
View Code

 
10.parse
    • Grammar:scrapy parse <url> [options]
    • Project Required: Yes
    • Parameters
      • --spider=SPIDER: Skips automatic detection of spiders and enforces the use of specific spiders
      • --a NAME=VALUE: Sets the parameters of the spider (may be duplicated)
      • --callback or -c: The callback function used to parse the return (response) in the spider
      • --pipelines: Handling Item in pipeline
      • --rules or -r: Use the Crawlspider rule to discover the callback function used to parse the return (response)
      • --noitems: Do not show crawled to item
      • --nolinks: Do not display the extracted links
      • --nocolour: Avoid using pygments to color the output
      • --depth or -d: Specify the number of levels for follow-up link requests (default: 1)
      • --verbose or -v: Show details for each request

Gets the given URL and uses the appropriate spider analysis processing. If you provide --callback options, use this method of the spider to handle it, otherwise use parse .

Example:

$ scrapy Parse http://www.example.com/-C Parse_item[... scrapy log lines crawling example.com spider ...]>>> STATUS DEPTH Level1<<<# scraped Items------------------------------------------------------------[{'name'8 7'Example Item', 'category': U'Furniture', 'length': U'- cm'}]# Requests-----------------------------------------------------------------[]
View Code

Settings

    • Grammar:scrapy settings [options]
    • Item Required: No

Get the Scrapy settings

When running in a project, the command outputs the project's setpoint, otherwise the output scrapy the default setting.

Example:

$ scrapy settings --get BOT_NAMEscrapybot$ scrapy settings --get DOWNLOAD_DELAY0
Runspider
    • Grammar:scrapy runspider <spider_file.py>
    • Item Required: No

Run a spider that is written in a Python file without creating a project.

Example:

$ scrapy runspider myspider.py[ ... spider starts crawling ... ]
Version
    • Grammar:scrapy version [-v]
    • Item Required: No

Output scrapy version. With the-v runtime, the command outputs both python,twisted and platform information to facilitate bug submissions.

Deploy

New features.

    • Grammar:scrapy deploy [ <target:project> | -l <target> | -L ]
    • Project Required: Yes

Deploy the project to the Scrapyd service. View the deployment of your project.

Bench

New features.

    • Syntax: Scrapy bench
    • Item Required: No

Run the benchmark test. Benchmarking.

Custom Project Commands

You can also add your own project commands via Commands_module. You can learn how to implement your commands in Scrapy/commands scrapy commands as an example.

Commands_module
Default: ‘‘ (empty string)

The module used to find the Add Custom scrapy command.

Example:

COMMANDS_MODULE = ‘mybot.commands‘

[Scrapy] [Go] about scrapy command

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.