No. 347, Python distributed crawler build search engine scrapy explaining-randomly replace User-agent browser user agent via Downloadmiddleware

Source: Internet
Author: User

No. 347, Python distributed crawler build search engine scrapy explaining-randomly replace User-agent browser user agent via Downloadmiddleware

Downloadmiddleware Introduction
Middleware is a framework that can be connected to request/response processing. This is a very light, low-level system that can change scrapy requests and responses. That is, the middleware between the requests request and the response response can modify the requests request and the response response globally.

The useragentmiddleware () method under the useragent.py of the downloadmiddleware in the source code

we can see from the source when The default user-agent for requests requests is scrapy, which is easily recognized by the site and intercepts crawlers

user-agent Browser user agent that randomly replaces requests request header information with Downloadmiddleware middleware

The first step, in the settings.py configuration file, open the middleware registration downloader_middlewares={}

Set the default Useragentmiddleware to None, or set it to the maximum, so that our custom middleware modifies the default User_agent to execute first

settings.py configuration file

# Enable or disable downloader middlewares # See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html  = {              # Open Registration middleware   #  ' adc.middlewares.MyCustomDownloaderMiddleware ': 543,   'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware'  # Set the default Useragentmiddleware to none}

No. 347, Python distributed crawler build search engine scrapy explaining-randomly replace User-agent browser user agent via Downloadmiddleware

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.