The example in this article describes how Python puts back a large page download in the process of capturing data using Scrapy. Share to everyone for your reference. The specific analysis is as follows:
Add the following code to Settings.py,myproject for your project name
Copy Code code as follows:
downloader_httpclientfactory = ' Myproject.downloader.LimitSizeHTTPClientFactory '
Custom limit modules for downloading large pages
Copy Code code as follows:
Max_response_size = 1048576 # 1Mb
From scrapy.core.downloader.webclient import Scrapyhttpclientfactory, Scrapyhttppagegetter
Class Limitsizepagegetter (Scrapyhttppagegetter):
def handleheader (self, Key, value):
Scrapyhttppagegetter.handleheader (self, key, value)
If key.lower () = = ' Content-length ' and int (value) > max_response_size:
Self.connectionlost (' oversized ')
Class Limitsizehttpclientfactory (Scrapyhttpclientfactory):
protocol = Limitsizepagegetter
I hope this article will help you with your Python programming.