Installation of Sesame Http:scrapyd

Last Update:2018-02-13 Source: Internet

Author: User

Tags nginx server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scrapyd is a tool for deploying and running the Scrapy project, and with it, you can upload the written scrapy project to the cloud host and control its operation through the API.

Since it is a scrapy project deployment and basically uses a Linux host, the installation of this section is for Linux hosts.

1. RELATED LINKS

Github:https://github.com/scrapy/scrapyd
Pypi:https://pypi.python.org/pypi/scrapyd
Official Document: Https://scrapyd.readthedocs.io

2. PIP installation

The PIP installation is recommended here, with the following command:

PIP3 Install Scrapyd

3. Configuration

After installation, you need to create a new profile/etc/scrapyd/scrapyd.conf,scrapyd will read this profile when it is run.

After the Scrapyd 1.2 version, the file is not automatically created and needs to be added by ourselves.

First, create a new file by executing the following command:

sudo mkdir/etc//etc/scrapyd/scrapyd.conf

Then write the following content:

[Scrapyd]eggs_dir=Eggslogs_dir=Logsitems_dir=Jobs_to_keep=5Dbs_dir=Dbsmax_proc=0max_proc_per_cpu=TenFinished_to_keep= -Poll_interval=5.0bind_address=0.0.0.0Http_port=6800Debug=Offrunner=scrapyd.runnerapplication=Scrapyd.app.applicationlauncher=Scrapyd.launcher.Launcherwebroot=Scrapyd.website.root[services]schedule.json=Scrapyd.webservice.Schedulecancel.json=Scrapyd.webservice.Canceladdversion.json=Scrapyd.webservice.AddVersionlistprojects.json=Scrapyd.webservice.ListProjectslistversions.json=Scrapyd.webservice.ListVersionslistspiders.json=Scrapyd.webservice.ListSpidersdelproject.json=Scrapyd.webservice.DeleteProjectdelversion.json=Scrapyd.webservice.DeleteVersionlistjobs.json=Scrapyd.webservice.ListJobsdaemonstatus.json= Scrapyd.webservice.DaemonStatus

The contents of the configuration file can be found in the official document Https://scrapyd.readthedocs.io/en/stable/config.html#example-configuration-file. The configuration file here has been modified, one of which is the max_proc_per_cpu official default of 4, that is, one host each CPU runs up to 4 scrapy tasks, which increases to 10. The other is bind_address that the default is local 127.0.0.1, which is modified to 0.0.0.0, so that the extranet can be accessed.

4. Running in the background

Scrapyd is a pure Python project, where it can be called directly to run. To keep the program running in the background, Linux and Mac can use the following commands:

(Scrapyd >/dev/null &)

This scrapyd will continue to run in the background, the console output is ignored directly. Of course, if you want to log output logs, you can modify the output target, such as:

(Scrapyd > ~/scrapyd.log &)

Of course, you can also use screen, Tmux, supervisor and other tools to implement process daemon.

After running, you can access the Web UI on port 6800 of the browser, where you can see the current scrapyd running tasks, logs, and so on.

Of course, the better way to run Scrapyd is to use the Supervisor daemon, if interested, you can refer to: http://supervisord.org/.

In addition, SCRAPYD supports Docker, and we'll show you how to make and run the Scrapyd Docker image later.

5. Access authentication

After the configuration is complete, both the SCRAPYD and its interfaces are publicly accessible. If you want to configure access authentication, you can use Nginx to do the reverse proxy, the first installation of the Nginx server.

Here is an example of Ubuntu, with the following installation commands:

sudo apt-get install Nginx

Then modify the Nginx configuration file nginx.conf, add the following configuration:

http {    server {        6801;         / {            proxy_pass    http://127.0.0.1:6800/;            Auth_basic    "Restricted";            Auth_basic_user_file    /etc/nginx/conf.d/. htpasswd;     }}}

The user name and password configuration used here is placed in the/ETC/NGINX/CONF.D directory and we need to create it using the htpasswd command. For example, create a file with User name Admin, with the following command:

Htpasswd-c. htpasswd Admin

Then we will be prompted to enter the password, enter two times, will generate a password file. Now look at the contents of this file:

Cat. htpasswd ADMIN:5ZBXQR0RCQWBC

After the configuration is complete, restart the Nginx service and run the following command:

sudo nginx-s reload

This will successfully configure the Scrapyd access authentication.

Installation of Sesame Http:scrapyd

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More