Environment configuration:
Http://splash.readthedocs.io/en/stable/install.html
Pip Install Scrapy-splash
Docker Pull Scrapinghub/splash
Docker Run-p 8050:8050 Scrapinghub/splash
----
settings.py
#--splash_url = ' http://localhost:8050 ' #--downloader_middlewares = {' Scrapy_splash. Splashcookiesmiddleware ': 723, ' Scrapy_splash. Splashmiddleware ': 725, ' scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware ': 810,}#--spider_ Middlewares = {' Scrapy_splash. Splashdeduplicateargsmiddleware ': 100,}#--dupefilter_class = ' Scrapy_splash. Splashawaredupefilter ' #--httpcache_storage = ' scrapy_splash. Splashawarefscachestorage ' Import scrapyfrom scrapy_splash import Splashrequestclass myspider (scrapy. Spider): Start_urls = ["http://example.com", "Http://example.com/foo"] def start_requests (self): for URL in S Elf.start_urls:yield splashrequest (URL, self.parse, args={' Wait ': 0.5}) def parse (self, Response): # Response.body is a result of render.html call; It # contains HTML processed by a browser. # ...
Reference Link: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
http://blog.csdn.net/qq_23849183/article/details/51287935
http://ae.yyuap.com/pages/viewpage.action?pageId=919763
Scrapy framework combined with splash parsing js--environment configuration