Sesame HTTP: Installation of Scrapy-Splash and sesame scrapy-splash

Source: Internet
Author: User

Sesame HTTP: Installation of Scrapy-Splash and sesame scrapy-splash

Scrapy-Splash is a JavaScript rendering tool in Scrapy. This section describes how to install Scrapy.

Scrapy-Splash is installed in two parts. One is the installation of the Splash service, specifically through Docker. After the installation, a Splash service will be started. We can use its interface to load JavaScript pages. The other is the installation of the Scrapy-Splash Python library. After installation, you can use the Splash service in Scrapy.

1. Related Links
  • GitHub: https://github.com/scrapy-plugins/scrapy-splash
  • PyPI: https://pypi.python.org/pypi/scrapy-splash
  • Instruction: https://github.com/scrapy-plugins/scrapy-splash#configuration
  • Splash official documentation: http://splash.readthedocs.io
2. Install Splash

Scrapy-Splash uses the http api of Splash for page rendering. Therefore, we need to install Splash to provide the rendering service. Install Docker here. Before that, make sure that Docker is correctly installed.

The installation command is as follows:

docker run -p 8050:8050 scrapinghub/splash

After the installation is complete, a similar output is displayed:

2017-07-03 08:53:28+0000 [-] Log opened.2017-07-03 08:53:28.447291 [-] Splash version: 3.02017-07-03 08:53:28.452698 [-] Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.22017-07-03 08:53:28.453120 [-] Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609]2017-07-03 08:53:28.453676 [-] Open files limit: 10485762017-07-03 08:53:28.454258 [-] Can't bump open files limit2017-07-03 08:53:28.571306 [-] Xvfb is started: ['Xvfb', ':1599197258', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'2017-07-03 08:53:29.041973 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles2017-07-03 08:53:29.315445 [-] verbosity=12017-07-03 08:53:29.315629 [-] slots=502017-07-03 08:53:29.315712 [-] argument_cache_max_entries=5002017-07-03 08:53:29.316564 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)2017-07-03 08:53:29.317614 [-] Site starting on 80502017-07-03 08:53:29.317801 [-] Starting factory <twisted.web.server.Site object at 0x7ffaa4a98cf8>

This proves that Splash is running on port 8050. In this case, go to http: // localhost: 8050 to view the homepage of Splash, as shown in figure 1-80.

Figure 1 running page

Of course, Splash can also be directly installed on a remote server. Run Splash on the server in the daemon mode. The command is as follows:

docker run -d -p 8050:8050 scrapinghub/splash

More here-dParameter, which indicates that the Docker container runs in the daemon mode. In this way, the Splash service is not terminated after the remote server connection is interrupted.

3. Scrapy-Splash Installation

After successful Splash installation, install the Python library. The command is as follows:

pip3 install scrapy-splash

After the command is run, the Library is successfully installed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.