Reptile Summary (iii)--Cloud Scrapy__web

Source: Internet
Author: User

Found a more fun thing, Scrapinghub, try to play a bit cloud scrapy, because it is free. The biggest advantage is that you can visualize the crawler. Here is a simple record of how it is used.
registered account & New Scrapy Cloud Project

Registered account at Scrapyinghub website
After you log in to create project, under the new project, view Code & deploys, locate the API key and project ID
Deploy your project

$ pip Install Shub

Login and enter API key

$ shub Login
Enter your API key from Https://dash.scrapinghub.com/account/apikey
API key: Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
validating API key ...
API key is OK, and you are are logged in now.

Deploy and enter the Project ID

$ shub Deploy ProjectID
packing version ed6b3b8-master deploying to
scrapy Cloud project "76180"
{"status": " OK ", project": 76180, "version": "Ed6b3b8-master", "Spiders": 1}
Run your spiders at:https://dash.scrapinghub.com/ p/76180/
Schedule your spider

Select Run spider to open the crawler under your own project panel, or you can open it from the command line.

Shub schedule Zhidao
Spider Zhidao scheduled, job ID:76153/2/2
Watch the "Log On" command line:
    Shub log-f 2/2
or print items as they are being scraped:
    shub items-f 2/2
or watch it running in Scrapinghub ' s web int Erface:
    HTTPS://DASH.SCRAPINGHUB.COM/P/76153/JOB/2/3

See the latest log and items
Jobid format: 2/2, 2/1 ...

Shub log jobid
shub items Jobid

Or Dashboard View the results

Through Dashbord can also real-time monitoring crawler job situation, the number of requests issued, the number of crawled item, log and error information, execution time, etc., are at a glance. Save Items

Distributed crawler

Cloud Scrapy also offers a choice of distributed reptiles, of course, paid for. Crawlera

Intrepid Crawlera provides the mechanism to prevent ban, through IP, user-agent, cookies and other settings, to prevent the crawler is banned, see billing

Complete code

Reference Links:
Http://doc.scrapinghub.com/scrapy-cloud.html#deploying-a-scrapy-spider

Original address: http://www.shuang0420.com/2016/06/15/%E7%88%AC%E8%99%AB%E6%80%BB%E7%BB%93-%E4%B8%89-scrapinghub/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.