Scrapy provides two types of commands. A command that must be run in a scrapy project ( for a project (project-specific) ), and one that is not required ( Global command ). Global commands may behave differently when running in a project than in a non-project (because the project's settings may be used).
Global command:
startproject
settings
runspider
shell
fetch
view
version
Project (project-only) command:
crawl
check
list
edit
parse
genspider
deploy
bench
1.startproject
- Grammar:
scrapy startproject <project_name>
- Item Required: No
Under the Project_Name folder, create a scrapy project named Project_Name.
Example:
$ scrapy Startproject MyProject
View Code2.genspider
- Grammar:
scrapy genspider [-t template] <name> <domain>
- Project Required: Yes
- Parameters
- -l list available templates
- -e Edit after creation is complete
- -D in standard output display
- -T using User templates
Creates a spider in the current project.
This is just a quick way to create a spider. This method can be used to create a spider using a template that is defined in advance. You can also create your own spider's source files.
Example:
$ scrapy Genspider-lavailable templates:basic Crawl csvfeed xmlfeed$ scrapy genspider-d basicimport scrapyclass $classname (scrapy. Spider): Name="$name"Allowed_domains= ["$domain"] Start_urls= ( 'http://www. $domain/',) def parse (self, Response): pass$ scrapy genspider-T Basic example example.comcreated spider'Example'Using template'Basic' inchModule:mybot.spiders.example
View Code
3.crawl
- Grammar:
scrapy crawl [options] <spider>
- Project Required: Yes
- Parameters
- -A set spider argument
- -o file output results to files
- -t format to determine-o formatting
Use spiders for crawling.
Example:
$ scrapy Crawl myspider[... myspider starts crawling ...]
View Code
4.check
- Grammar:
scrapy check [options] <spider>
- Project Required: Yes
- Parameters
- -L just list contracts, do not check
- -V Prints all Spiders contract test results
Run the contract check. So, what's the name contract?
Example:
$ scrapy Check-lfirst_spider * parse * parse_itemsecond_spider * Parse * parse_item$ scrapy check[failed] first_spider:parse_item'Retailpricex ' field is missing[failed] First_spider:parse the 0.. 4
View Code
5.list
- Grammar:
scrapy list
- Project Required: Yes
Lists all the available spiders in the current project. Each row outputs a spider.
Examples of Use:
$ scrapy Listspider1spider2
View Code
6.edit
- Grammar:
scrapy edit <spider>
- Project Required: Yes
Edit the given spider with the editor set in the editor
The command simply provides a shortcut. Developers are free to choose other tools or ides to write debug spiders.
7.fetch
- Grammar:
scrapy fetch [options] <url>
- Item Required: No
- Parameters
- --spider=spider Use this reptile.
- --headers prints the headers of the URL instead of its body
Download the given URL using the Scrapy Downloader (downloader) and send the captured content to standard output.
This command gets the page in the way the Spider downloads the page. For example, if the spider has USER_AGENT
properties that modify the User Agent, the command will use that property.
Therefore, you can use this command to see how the spider obtains a particular page.
This command will use the default Scrapy downloader setting if it is running in a non-project.
Example:
$ scrapy Fetch--nolog http://www.example.com/some/page.html[... HTML content here ...] $ scrapy Fetch--nolog--headers http://www.example.com/{'accept-ranges': ['bytes'], ' Age': ['1263'], 'Connection': ['Close'], 'Content-length': ['596'], 'Content-type': ['text/html; Charset=utf-8'], 'Date': ['Wed, 23:59:46 GMT'], 'Etag': ['"573c1-254-48c9c87349680"'], 'last-modified': ['Fri, 15:30:18 GMT'], 'Server': ['apache/2.2.3 (CentOS)']}
View Code
8.view
- Grammar:
scrapy view [options] <url>
- Item Required: No
- Parameters
- --spider=spider Use this reptile.
Opens the given URL in the browser and displays it in the form acquired by the Scrapy Spider. Sometimes spiders get pages that are not the same as ordinary users see. So this command can be used to check the page the spider gets to and confirm that this is what you expect.
Example:
$ scrapy View http://www.example.com/some/page.html[... browser starts ...]
View Code
9.shell
- Grammar:
scrapy shell [url|file]
- Item Required: No
Launches the Scrapy shell with the given URL (if given) or empty (no URL given). Check out the Scrapy terminal (Scrapy shell) for more information.
Example:
$ scrapy Shell http://www.example.com/some/page.html[... scrapy shell starts ...]
View Code
10.parse
- Grammar:
scrapy parse <url> [options]
- Project Required: Yes
- Parameters
--spider=SPIDER
: Skips automatic detection of spiders and enforces the use of specific spiders
--a NAME=VALUE
: Sets the parameters of the spider (may be duplicated)
--callback or -c
: The callback function used to parse the return (response) in the spider
--pipelines
: Handling Item in pipeline
--rules or -r
: Use the Crawlspider rule to discover the callback function used to parse the return (response)
--noitems
: Do not show crawled to item
--nolinks
: Do not display the extracted links
--nocolour
: Avoid using pygments to color the output
--depth or -d
: Specify the number of levels for follow-up link requests (default: 1)
--verbose or -v
: Show details for each request
Gets the given URL and uses the appropriate spider analysis processing. If you provide --callback
options, use this method of the spider to handle it, otherwise use parse
.
Example:
$ scrapy Parse http://www.example.com/-C Parse_item[... scrapy log lines crawling example.com spider ...]>>> STATUS DEPTH Level1<<<# scraped Items------------------------------------------------------------[{'name'8 7'Example Item', 'category': U'Furniture', 'length': U'- cm'}]# Requests-----------------------------------------------------------------[]
View Code
Settings
- Grammar:
scrapy settings [options]
- Item Required: No
Get the Scrapy settings
When running in a project, the command outputs the project's setpoint, otherwise the output scrapy the default setting.
Example:
$ scrapy settings --get BOT_NAMEscrapybot$ scrapy settings --get DOWNLOAD_DELAY0
Runspider
- Grammar:
scrapy runspider <spider_file.py>
- Item Required: No
Run a spider that is written in a Python file without creating a project.
Example:
$ scrapy runspider myspider.py[ ... spider starts crawling ... ]
Version
- Grammar:
scrapy version [-v]
- Item Required: No
Output scrapy version. With the-v runtime, the command outputs both python,twisted and platform information to facilitate bug submissions.
Deploy
New features.
- Grammar:
scrapy deploy [ <target:project> | -l <target> | -L ]
- Project Required: Yes
Deploy the project to the Scrapyd service. View the deployment of your project.
Bench
New features.
- Syntax: Scrapy bench
- Item Required: No
Run the benchmark test. Benchmarking.
Custom Project Commands
You can also add your own project commands via Commands_module. You can learn how to implement your commands in Scrapy/commands scrapy commands as an example.
Commands_module
Default: ‘‘ (empty string)
The module used to find the Add Custom scrapy command.
Example:
COMMANDS_MODULE = ‘mybot.commands‘
[Scrapy] [Go] about scrapy command