Sometimes when crawling data, some file data needs to be crawled down and loaded down using multi-threaded download to make the program run faster.
There is an extension in scrapy that can be downloaded using extension modules.
Add Custom_settings to your spider
class Mytestspider (scrapy. Spider): Name = mytest " custom_settings = { /span> " extensions " : {#设在拓展 mymidtest.mydownutils.extension.spideropencloselogging : 500 " myext _enabled " : True, #打开拓展}
' Mymidtest.mydownutils.extension.SpiderOpenCloseLogging ' for Project road strength under the Mydownuils package
In the INI function, add
def __init__ (self,): ... .. ......... = Operatredis (self.name) = Self.myredis.get_instent ()
Add when you want to download
Self.myredis.add_url_filepath (self. Redis,url,filepath_all)
URL is download URL address, filepath_all file storage address
The toolkit is configured so that it can be downloaded
Toolkit Address
GitHub Address: Https://github.com/sea1234/pyScrapyDownUtils
Scrapy Multi-threaded file download