[Scrapy] [Go] about scrapy command

Last Update:2016-10-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scrapy provides two types of commands. A command that must be run in a scrapy project ( for a project (project-specific) ), and one that is not required ( Global command ). Global commands may behave differently when running in a project than in a non-project (because the project's settings may be used).

Global command:

startproject
settings
runspider
shell
fetch
view
version

Project (project-only) command:

crawl
check
list
edit
parse
genspider
deploy
bench

1.startproject

Grammar:scrapy startproject <project_name>
Item Required: No

Under the Project_Name folder, create a scrapy project named Project_Name.

Example:

$ scrapy Startproject MyProject

View Code2.genspider

Grammar:scrapy genspider [-t template] <name> <domain>
Project Required: Yes
Parameters
- -l list available templates
- -e Edit after creation is complete
- -D in standard output display
- -T using User templates

Creates a spider in the current project.

This is just a quick way to create a spider. This method can be used to create a spider using a template that is defined in advance. You can also create your own spider's source files.

Example:

$ scrapy Genspider-lavailable templates:basic Crawl csvfeed xmlfeed$ scrapy genspider-d basicimport scrapyclass $classname (scrapy. Spider): Name="$name"Allowed_domains= ["$domain"] Start_urls= (        'http://www. $domain/',) def parse (self, Response): pass$ scrapy genspider-T Basic example example.comcreated spider'Example'Using template'Basic' inchModule:mybot.spiders.example

View Code

3.crawl

Grammar:scrapy crawl [options] <spider>
Project Required: Yes
Parameters
- -A set spider argument
- -o file output results to files
- -t format to determine-o formatting

Use spiders for crawling.

Example:

$ scrapy Crawl myspider[... myspider starts crawling ...]

View Code

4.check

Grammar:scrapy check [options] <spider>
Project Required: Yes
Parameters
- -L just list contracts, do not check
- -V Prints all Spiders contract test results

Run the contract check. So, what's the name contract?

Example:

$ scrapy Check-lfirst_spider  * parse  * parse_itemsecond_spider  * Parse   * parse_item$ scrapy check[failed] first_spider:parse_item'Retailpricex ' field is missing[failed] First_spider:parse  the 0.. 4

View Code

5.list

Grammar:scrapy list
Project Required: Yes

Lists all the available spiders in the current project. Each row outputs a spider.

Examples of Use:

$ scrapy Listspider1spider2

View Code

6.edit

Grammar:scrapy edit <spider>
Project Required: Yes

Edit the given spider with the editor set in the editor

The command simply provides a shortcut. Developers are free to choose other tools or ides to write debug spiders.

7.fetch

Grammar:scrapy fetch [options] <url>
Item Required: No
Parameters
- --spider=spider Use this reptile.
- --headers prints the headers of the URL instead of its body

Download the given URL using the Scrapy Downloader (downloader) and send the captured content to standard output.

This command gets the page in the way the Spider downloads the page. For example, if the spider has USER_AGENT properties that modify the User Agent, the command will use that property.

Therefore, you can use this command to see how the spider obtains a particular page.

This command will use the default Scrapy downloader setting if it is running in a non-project.

Example:

$ scrapy Fetch--nolog http://www.example.com/some/page.html[... HTML content here ...] $ scrapy Fetch--nolog--headers http://www.example.com/{'accept-ranges': ['bytes'], ' Age': ['1263'], 'Connection': ['Close'], 'Content-length': ['596'], 'Content-type': ['text/html; Charset=utf-8'], 'Date': ['Wed, 23:59:46 GMT'], 'Etag': ['"573c1-254-48c9c87349680"'], 'last-modified': ['Fri, 15:30:18 GMT'], 'Server': ['apache/2.2.3 (CentOS)']}

View Code

8.view

Grammar:scrapy view [options] <url>
Item Required: No
Parameters
- --spider=spider Use this reptile.

Opens the given URL in the browser and displays it in the form acquired by the Scrapy Spider. Sometimes spiders get pages that are not the same as ordinary users see. So this command can be used to check the page the spider gets to and confirm that this is what you expect.

Example:

$ scrapy View http://www.example.com/some/page.html[... browser starts ...]

View Code

9.shell

Grammar:scrapy shell [url|file]
Item Required: No

Launches the Scrapy shell with the given URL (if given) or empty (no URL given). Check out the Scrapy terminal (Scrapy shell) for more information.

Example:

$ scrapy Shell http://www.example.com/some/page.html[... scrapy shell starts ...]

View Code

10.parse

Grammar:scrapy parse <url> [options]
Project Required: Yes
Parameters
- --spider=SPIDER: Skips automatic detection of spiders and enforces the use of specific spiders
- --a NAME=VALUE: Sets the parameters of the spider (may be duplicated)
- --callback or -c: The callback function used to parse the return (response) in the spider
- --pipelines: Handling Item in pipeline
- --rules or -r: Use the Crawlspider rule to discover the callback function used to parse the return (response)
- --noitems: Do not show crawled to item
- --nolinks: Do not display the extracted links
- --nocolour: Avoid using pygments to color the output
- --depth or -d: Specify the number of levels for follow-up link requests (default: 1)
- --verbose or -v: Show details for each request

Gets the given URL and uses the appropriate spider analysis processing. If you provide --callback options, use this method of the spider to handle it, otherwise use parse .

Example:

$ scrapy Parse http://www.example.com/-C Parse_item[... scrapy log lines crawling example.com spider ...]>>> STATUS DEPTH Level1<<<# scraped Items------------------------------------------------------------[{'name'8 7'Example Item', 'category': U'Furniture', 'length': U'- cm'}]# Requests-----------------------------------------------------------------[]

View Code

Settings

Grammar:scrapy settings [options]
Item Required: No

Get the Scrapy settings

When running in a project, the command outputs the project's setpoint, otherwise the output scrapy the default setting.

Example:

$ scrapy settings --get BOT_NAMEscrapybot$ scrapy settings --get DOWNLOAD_DELAY0

Runspider

Grammar:scrapy runspider <spider_file.py>
Item Required: No

Run a spider that is written in a Python file without creating a project.

Example:

$ scrapy runspider myspider.py[ ... spider starts crawling ... ]

Version

Grammar:scrapy version [-v]
Item Required: No

Output scrapy version. With the-v runtime, the command outputs both python,twisted and platform information to facilitate bug submissions.

Deploy

New features.

Grammar:scrapy deploy [ <target:project> | -l <target> | -L ]
Project Required: Yes

Deploy the project to the Scrapyd service. View the deployment of your project.

Bench

New features.

Syntax: Scrapy bench
Item Required: No

Run the benchmark test. Benchmarking.

Custom Project Commands

You can also add your own project commands via Commands_module. You can learn how to implement your commands in Scrapy/commands scrapy commands as an example.

Commands_module

Default: ‘‘ (empty string)

The module used to find the Add Custom scrapy command.

Example:

COMMANDS_MODULE = ‘mybot.commands‘

[Scrapy] [Go] about scrapy command

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Scrapy] [Go] about scrapy command

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Scrapy] [Go] about scrapy command

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support