This article illustrates how python randomly allocates user-agent for each request when using Scrapy to collect data. Share to everyone for your reference. The specific analysis is as follows:
This method can be used each time to change the different user-agent, to prevent the site according to User-agent shielding Scrapy spider
First add the following code to the settings.py file, replacing the default user-agent processing module
Copy Code code as follows:
Downloader_middlewares = {
' Scraper.random_user_agent. Randomuseragentmiddleware ': 400,
' Scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware ': None,
}
Custom useragent Processing Module
Copy Code code as follows:
From scraper.settings import user_agent_list
Import Random
From scrapy import log
Class Randomuseragentmiddleware (object):
def process_request (self, request, spider):
UA = Random.choice (user_agent_list)
If UA:
Request.headers.setdefault (' User-agent ', UA)
#log. MSG (' >>>> UA%s '%request.headers)
I hope this article will help you with your Python programming.