What is splash:
Splash is a JavaScript rendering service. It is a lightweight browser that implements the HTTP API, and Splash is implemented in Python, using both twisted and QT. Twisted (QT) is used to enable the service to have asynchronous processing capability to perform webkit concurrency.
Why you should have splash:
In order to make the web crawler more effective, because of the current many web pages through the JavaScript mode of interaction, the simple crawl page mode is not competent for JavaScript page generation and AJAX Web page crawl, while analyzing the connection request to implement the local connection data request, Relatively complex, especially for the page with a specific time stamp algorithm, the analysis is more difficult, the efficiency is not high. By invoking a browser to simulate page action patterns, you need to use a browser that does not enable asynchronous and large-scale crawl requirements. In view of the above reasons Splash also have a useful. A page rendering server that returns the rendered page for easy crawling and easy to scale application.
Installation conditions:
Installation:
First click on the link below to download Docker under Windows from the Docker website to install it, but please note that the system requirements are **windows1064 Pro and above or educational version
Official website Download: https://store.docker.com/editions/community/docker-ce-desktop-windows
Run as administrator after the installation package download is complete.
View information:
#docker Info
#docker version
To view the started container
Download and install the splash image in Docker and install
#docker pull scrapinghub/splash
Start the splash service
#启动splash服务, and provide services through Http,https,telnet # usually uses HTTP mode, can only start a 8050 good #Splash will run at 0.0.0.0 at Ports 8050 (HTTP), 8051 ( HTTPS) and 5023 (telnet). Docker run-p 5023:5023-p 8050:8050-p 8051:8051 Scrapinghub/splash
Reference Link: https://www.jianshu.com/p/4052926bc12c
Python crawler Splash using the first experience