No. 347, Python distributed crawler build search engine scrapy explaining-randomly replace User-agent browser user agent via Downloadmiddleware
Downloadmiddleware Introduction
Middleware is a framework that can be connected to request/response processing. This is a very light, low-level system that can change scrapy requests and responses. That is, the middleware between the requests request and the response response can modify the requests request and the response response globally.
The useragentmiddleware () method under the useragent.py of the downloadmiddleware in the source code
we can see from the source when The default user-agent for requests requests is scrapy, which is easily recognized by the site and intercepts crawlers
user-agent Browser user agent that randomly replaces requests request header information with Downloadmiddleware middleware
The first step, in the settings.py configuration file, open the middleware registration downloader_middlewares={}
Set the default Useragentmiddleware to None, or set it to the maximum, so that our custom middleware modifies the default User_agent to execute first
settings.py configuration file
# Enable or disable downloader middlewares # See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html = { # Open Registration middleware # ' adc.middlewares.MyCustomDownloaderMiddleware ': 543, 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware' # Set the default Useragentmiddleware to none}
No. 347, Python distributed crawler build search engine scrapy explaining-randomly replace User-agent browser user agent via Downloadmiddleware