1. Some scrapy commands are only available under the Scrapy project root directory, such as the crawl command
2. Scrapy Genspider Taobao http://detail.tmall.com/item.htm?id=12577759834
Automatic generation of taobao.py in Spider directory
#-*-Coding:utf-8-*-import scrapyclass taobaospider (scrapy. Spider): name = "Taobao" allowed_domains = ["http://detail.tmall.com/item.htm?id=12577759834"] Start_urls = ( ' http://www.http://detail.tmall.com/item.htm?id=12577759834/',) def parse (self, Response): Pass
There are other templates that you can use
Scrapy genspider Taobao2 http://detail.tmall.com/item.htm?id=12577759834 --template=crawl
# -*- coding: utf-8 -*-import scrapyfrom scrapy.contrib.linkextractors import linkextractorfrom scrapy.contrib.spiders import crawlspider, rulefrom Project004.items import project004itemclass taobao2spider (Crawlspider): name = ' Taobao2 ' allowed_domains = [' http://detail.tmall.com/ item.htm?id=12577759834 '] start_urls = [' http://www.http://detail.tmall.com/ item.htm?id=12577759834/'] rules = ( rule (Linkextractor (allow=r ' items/'), callback= ' Parse_item ', follow=true), ) def parse_item (self, response): i = project004item () #i [' domain_id ' ] = response.xpath ('//input[@id = "Sid"]/@value '). Extract () #i [' Name '] = Response.xpath ('//div[@id = ' name '] '). Extract () #i [' description '] = response.xpath ('//div[@id = "description"]). Extract () return i
3. List the current project all Spider:scrapy list
4.fetch command Usage
A. scrapy fetch--nolog http://www.example.com/some/page.html
B. Scrapy Fetch--nolog--headers http://www.example.com/
5.view command to view Web page content in a browser
Scrapy View http://www.example.com/some/page.html
6. View Settings
Scrapy Settings--get bot_name
7. Run a self-contained spider without creating a project
Scrapy Runspider <spider_file.py>
Deployment of 8.scrapy project: Scrapy Deploy
The spider's server environment is the first to be deployed, typically using SCRAPYD
Installing SCRAPYD:PIP Install Scrapyd
Document: Http://scrapyd.readthedocs.org/en/latest/install.html
9. All available commands
C:\users\ibm_admin\pycharmprojects\pycrawl\project004>scrapy
Scrapy 0.24.4-project:project004
Usage:
Scrapy <command> [Options] [args]
Available commands:
Bench Run Quick Benchmark test
Check Check spider contracts
Crawl Run a spider
Deploy deploy project in Scrapyd target
Edit edit Spider
Fetch fetch a URL using the Scrapy downloader
Genspider Generate new spider using pre-defined templates
List List available spiders
Parse parse URL (using its spider) and print the results
Runspider Run a self-contained spider (without creating a project)
Settings Get Settings values
Shell Interactive Scraping Console
Startproject Create New Project
Version Print scrapy version
View Open URL in Browser, as seen by Scrapy
Python crawler Frame Scrapy Learning Note 6-------basic commands