Installation of Sesame Http:scrapyd

Source: Internet
Author: User
Tags nginx server

Scrapyd is a tool for deploying and running the Scrapy project, and with it, you can upload the written scrapy project to the cloud host and control its operation through the API.

Since it is a scrapy project deployment and basically uses a Linux host, the installation of this section is for Linux hosts.

1. RELATED LINKS
    • Github:https://github.com/scrapy/scrapyd
    • Pypi:https://pypi.python.org/pypi/scrapyd
    • Official Document: Https://scrapyd.readthedocs.io
2. PIP installation

The PIP installation is recommended here, with the following command:

PIP3 Install Scrapyd
3. Configuration

After installation, you need to create a new profile/etc/scrapyd/scrapyd.conf,scrapyd will read this profile when it is run.

After the Scrapyd 1.2 version, the file is not automatically created and needs to be added by ourselves.

First, create a new file by executing the following command:

sudo mkdir/etc//etc/scrapyd/scrapyd.conf

Then write the following content:

[Scrapyd]eggs_dir=Eggslogs_dir=Logsitems_dir=Jobs_to_keep=5Dbs_dir=Dbsmax_proc=0max_proc_per_cpu=TenFinished_to_keep= -Poll_interval=5.0bind_address=0.0.0.0Http_port=6800Debug=Offrunner=scrapyd.runnerapplication=Scrapyd.app.applicationlauncher=Scrapyd.launcher.Launcherwebroot=Scrapyd.website.root[services]schedule.json=Scrapyd.webservice.Schedulecancel.json=Scrapyd.webservice.Canceladdversion.json=Scrapyd.webservice.AddVersionlistprojects.json=Scrapyd.webservice.ListProjectslistversions.json=Scrapyd.webservice.ListVersionslistspiders.json=Scrapyd.webservice.ListSpidersdelproject.json=Scrapyd.webservice.DeleteProjectdelversion.json=Scrapyd.webservice.DeleteVersionlistjobs.json=Scrapyd.webservice.ListJobsdaemonstatus.json= Scrapyd.webservice.DaemonStatus

The contents of the configuration file can be found in the official document Https://scrapyd.readthedocs.io/en/stable/config.html#example-configuration-file. The configuration file here has been modified, one of which is the max_proc_per_cpu official default of 4, that is, one host each CPU runs up to 4 scrapy tasks, which increases to 10. The other is bind_address that the default is local 127.0.0.1, which is modified to 0.0.0.0, so that the extranet can be accessed.

4. Running in the background

Scrapyd is a pure Python project, where it can be called directly to run. To keep the program running in the background, Linux and Mac can use the following commands:

(Scrapyd >/dev/null &)

This scrapyd will continue to run in the background, the console output is ignored directly. Of course, if you want to log output logs, you can modify the output target, such as:

(Scrapyd > ~/scrapyd.log &)

Of course, you can also use screen, Tmux, supervisor and other tools to implement process daemon.

After running, you can access the Web UI on port 6800 of the browser, where you can see the current scrapyd running tasks, logs, and so on.

Of course, the better way to run Scrapyd is to use the Supervisor daemon, if interested, you can refer to: http://supervisord.org/.

In addition, SCRAPYD supports Docker, and we'll show you how to make and run the Scrapyd Docker image later.

5. Access authentication

After the configuration is complete, both the SCRAPYD and its interfaces are publicly accessible. If you want to configure access authentication, you can use Nginx to do the reverse proxy, the first installation of the Nginx server.

Here is an example of Ubuntu, with the following installation commands:

sudo apt-get install Nginx

Then modify the Nginx configuration file nginx.conf, add the following configuration:

http {    server {        6801;         / {            proxy_pass    http://127.0.0.1:6800/;            Auth_basic    "Restricted";            Auth_basic_user_file    /etc/nginx/conf.d/. htpasswd;     }}}

The user name and password configuration used here is placed in the/ETC/NGINX/CONF.D directory and we need to create it using the htpasswd command. For example, create a file with User name Admin, with the following command:

Htpasswd-c. htpasswd Admin

Then we will be prompted to enter the password, enter two times, will generate a password file. Now look at the contents of this file:

Cat. htpasswd ADMIN:5ZBXQR0RCQWBC

After the configuration is complete, restart the Nginx service and run the following command:

sudo nginx-s reload

This will successfully configure the Scrapyd access authentication.

Installation of Sesame Http:scrapyd

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.