Scrapy Crawler's scrapyd-client management spider

Source: Internet
Author: User
Tags curl json versions
Introduction

Scrapyd as a daemon, running the Scrapy Crawler service program, it supports the Http/json command mode to publish, delete, start, stop the crawler program. Scrapyd can manage multiple project, and each project can have multiple versions, but only the latest version is used to run the spider.

Scrapyd-client is a tool dedicated to the release of Scrapy crawlers, although it also has some management functions, but is not as complete as scrapyd, so it is recommended only for publishing.

Attention:
The scrapyd-client version has Scrapyd-deploy and Scrapyd-client commands, and the old version may have only scrapyd-client commands. installation

Source Activate Scrapy
pip install scrapyd
pip install scrapyd-client
Deployment

1. Modify Scarpy.cfg
Modify Scrapy.cfg under the project root directory

CD Scrapy/douban
vim scrapy.cfg
#test是deploy的别名
[deploy:yanggd]
url = http://10.11.2.102:6800/
#工程名
project = Douban
#访问web的用户名及密码
#username =
#password =


#启动scrapyd
scrapyd

After running, Scrapyd will start a web to monitor the operation of the spider. Because Scarpyd does not support user authentication, it is possible to set up authentication through Nginx proxy or other means.

2.scrapyd-deploy deployment

#部署scrapy Project
scrapyd-deploy test-p douban-v v1

which
Test is the Deploy alias
Douban for Project name
V1 is the version number

3. Managing Spiders

#列出所有工程 scrapyd-client-t http://10.11.2.102:6800 Projects or Curl Http://10.11.2.102:6800/listprojects.json {"Status": "OK", "Projects": ["Default", "Douban"], "node_name": "yanggd-qitianm4650-d089"} #查看爬虫 Curl Http://10.11.2.102:6800/ Listspiders.json?project=douban {"Status": "OK", "Spiders": ["Douban_login", "Fanghua", "Langyabang", "movieTop250", " Movietop250_crawlspider "," Movietop250_login_crawlspider "," Tongcheng_pipeline "]," node_name ":" yanggd-qitianm4650-d089 "} #列出版本 Curl Http://10.11.2.102:6800/listversions.json?project=douban {" status ":" OK "," Versions ": [" 1516115564 "," 1516199516 "," 1516265513 "," V1 "]," node_name ":" yanggd-qitianm4650-d089 "} #删除版本 Curl/HTTP/ 10.11.2.102:6800/delversion.json-d "project=douban&version=1516115564" {"Status": "OK", "Node_name": " yanggd-qitianm4650-d089 "} #调度执行爬虫 Curl http://10.11.2.102:6800/schedule.json-d" project=douban&spider= Tongcheng_pipeline&jobid=tongcheng_pipeline "{" Status: "OK", "Jobid": "Tongcheng_pipeline", "Node_namE ":" yanggd-qitianm4650-d089 "} #查看爬虫的执行状态 Curl http://10.11.2.102:6800/listjobs.json?project=douban| | Python-m json.tool {"Status": "OK", "Running": [{"Start_time": "2018-01-22 19:45:14.376731", "pid": 28067, "id": "Tongche
Ng_pipeline "," Spider ":" Tongcheng_pipeline "}]," Finished ": []," Pending ": []," Node_name ":" yanggd-qitianm4650-d089 "} #停止爬虫 Curl http://10.11.2.102:6800/cancel.json-d "Project=douban&job=tongcheng_pipeline" {"Status": "OK", " Prevstate ": null," Node_name ":" yanggd-qitianm4650-d089 "}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.