Introduction
Scrapyd as a daemon, running the Scrapy Crawler service program, it supports the Http/json command mode to publish, delete, start, stop the crawler program. Scrapyd can manage multiple project, and each project can have multiple versions, but only the latest version is used to run the spider.
Scrapyd-client is a tool dedicated to the release of Scrapy crawlers, although it also has some management functions, but is not as complete as scrapyd, so it is recommended only for publishing.
Attention:
The scrapyd-client version has Scrapyd-deploy and Scrapyd-client commands, and the old version may have only scrapyd-client commands. installation
Source Activate Scrapy
pip install scrapyd
pip install scrapyd-client
Deployment
1. Modify Scarpy.cfg
Modify Scrapy.cfg under the project root directory
CD Scrapy/douban
vim scrapy.cfg
#test是deploy的别名
[deploy:yanggd]
url = http://10.11.2.102:6800/
#工程名
project = Douban
#访问web的用户名及密码
#username =
#password =
#启动scrapyd
scrapyd
After running, Scrapyd will start a web to monitor the operation of the spider. Because Scarpyd does not support user authentication, it is possible to set up authentication through Nginx proxy or other means.
2.scrapyd-deploy deployment
#部署scrapy Project
scrapyd-deploy test-p douban-v v1
which
Test is the Deploy alias
Douban for Project name
V1 is the version number
3. Managing Spiders
#列出所有工程 scrapyd-client-t http://10.11.2.102:6800 Projects or Curl Http://10.11.2.102:6800/listprojects.json {"Status": "OK", "Projects": ["Default", "Douban"], "node_name": "yanggd-qitianm4650-d089"} #查看爬虫 Curl Http://10.11.2.102:6800/ Listspiders.json?project=douban {"Status": "OK", "Spiders": ["Douban_login", "Fanghua", "Langyabang", "movieTop250", " Movietop250_crawlspider "," Movietop250_login_crawlspider "," Tongcheng_pipeline "]," node_name ":" yanggd-qitianm4650-d089 "} #列出版本 Curl Http://10.11.2.102:6800/listversions.json?project=douban {" status ":" OK "," Versions ": [" 1516115564 "," 1516199516 "," 1516265513 "," V1 "]," node_name ":" yanggd-qitianm4650-d089 "} #删除版本 Curl/HTTP/ 10.11.2.102:6800/delversion.json-d "project=douban&version=1516115564" {"Status": "OK", "Node_name": " yanggd-qitianm4650-d089 "} #调度执行爬虫 Curl http://10.11.2.102:6800/schedule.json-d" project=douban&spider= Tongcheng_pipeline&jobid=tongcheng_pipeline "{" Status: "OK", "Jobid": "Tongcheng_pipeline", "Node_namE ":" yanggd-qitianm4650-d089 "} #查看爬虫的执行状态 Curl http://10.11.2.102:6800/listjobs.json?project=douban| | Python-m json.tool {"Status": "OK", "Running": [{"Start_time": "2018-01-22 19:45:14.376731", "pid": 28067, "id": "Tongche
Ng_pipeline "," Spider ":" Tongcheng_pipeline "}]," Finished ": []," Pending ": []," Node_name ":" yanggd-qitianm4650-d089 "} #停止爬虫 Curl http://10.11.2.102:6800/cancel.json-d "Project=douban&job=tongcheng_pipeline" {"Status": "OK", " Prevstate ": null," Node_name ":" yanggd-qitianm4650-d089 "}