Found a more fun thing, Scrapinghub, try to play a bit cloud scrapy, because it is free. The biggest advantage is that you can visualize the crawler. Here is a simple record of how it is used.
registered account & New Scrapy Cloud Project
Registered account at Scrapyinghub website
After you log in to create project, under the new project, view Code & deploys, locate the API key and project ID
Deploy your project
$ pip Install Shub
Login and enter API key
$ shub Login
Enter your API key from Https://dash.scrapinghub.com/account/apikey
API key: Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
validating API key ...
API key is OK, and you are are logged in now.
Deploy and enter the Project ID
$ shub Deploy ProjectID
packing version ed6b3b8-master deploying to
scrapy Cloud project "76180"
{"status": " OK ", project": 76180, "version": "Ed6b3b8-master", "Spiders": 1}
Run your spiders at:https://dash.scrapinghub.com/ p/76180/
Schedule your spider
Select Run spider to open the crawler under your own project panel, or you can open it from the command line.
Shub schedule Zhidao
Spider Zhidao scheduled, job ID:76153/2/2
Watch the "Log On" command line:
Shub log-f 2/2
or print items as they are being scraped:
shub items-f 2/2
or watch it running in Scrapinghub ' s web int Erface:
HTTPS://DASH.SCRAPINGHUB.COM/P/76153/JOB/2/3
See the latest log and items
Jobid format: 2/2, 2/1 ...
Shub log jobid
shub items Jobid
Or Dashboard View the results
Through Dashbord can also real-time monitoring crawler job situation, the number of requests issued, the number of crawled item, log and error information, execution time, etc., are at a glance. Save Items
Distributed crawler
Cloud Scrapy also offers a choice of distributed reptiles, of course, paid for. Crawlera
Intrepid Crawlera provides the mechanism to prevent ban, through IP, user-agent, cookies and other settings, to prevent the crawler is banned, see billing
Complete code
Reference Links:
Http://doc.scrapinghub.com/scrapy-cloud.html#deploying-a-scrapy-spider
Original address: http://www.shuang0420.com/2016/06/15/%E7%88%AC%E8%99%AB%E6%80%BB%E7%BB%93-%E4%B8%89-scrapinghub/