Python crawler Splash using the first experience

Source: Internet
Author: User
Tags docker run

What is splash:

Splash is a JavaScript rendering service. It is a lightweight browser that implements the HTTP API, and Splash is implemented in Python, using both twisted and QT. Twisted (QT) is used to enable the service to have asynchronous processing capability to perform webkit concurrency.

Why you should have splash:

In order to make the web crawler more effective, because of the current many web pages through the JavaScript mode of interaction, the simple crawl page mode is not competent for JavaScript page generation and AJAX Web page crawl, while analyzing the connection request to implement the local connection data request, Relatively complex, especially for the page with a specific time stamp algorithm, the analysis is more difficult, the efficiency is not high. By invoking a browser to simulate page action patterns, you need to use a browser that does not enable asynchronous and large-scale crawl requirements. In view of the above reasons Splash also have a useful. A page rendering server that returns the rendered page for easy crawling and easy to scale application.

Installation conditions:

Installation:

First click on the link below to download Docker under Windows from the Docker website to install it, but please note that the system requirements are **windows1064 Pro and above or educational version

Official website Download: https://store.docker.com/editions/community/docker-ce-desktop-windows

  

Run as administrator after the installation package download is complete.

  

View information:

#docker Info

#docker version

To view the started container

  

Download and install the splash image in Docker and install

#docker pull scrapinghub/splash

Start the splash service

#启动splash服务, and provide services through Http,https,telnet # usually uses HTTP mode, can only start a 8050 good  #Splash will run at 0.0.0.0 at Ports 8050 (HTTP), 8051 ( HTTPS) and 5023 (telnet). Docker run-p 5023:5023-p 8050:8050-p 8051:8051 Scrapinghub/splash

  

  

Reference Link: https://www.jianshu.com/p/4052926bc12c

Python crawler Splash using the first experience

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.