When writing a web crawler, some sites will have anti-crawling measures, so it is possible to show the bug shown above
There may be two places where a bug occurs:
1. Requests when requested
Requests.get (URL), the returned result is 403.
Workaround:
headers= {
' User-ageent ': ' Some characters ',
' Cookies ': ' Some characters '
}
Requests.get (URL, headers=headers),
The return result should be 200, normal at this time. The purpose of joining headers is to simulate human behavior, to make the server think that the person is operating,
User-agent, cookies can be viewed on the page requests, can be identified, different pages, different cookies
2. Urlretrieve when downloading something
Workaround:
Import Urllib.requestopener=urllib.request.build_opener () opener.addheaders=[(' user-agent ', ' Mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/36.0.1941.0 safari/537.36 ')]urllib.request.install_opener ( Opener) url= "local=" Urllib.request.urlretrieve (url,local)
The principle is not very clear, found on the StackOverflow, the results are correct.
HTTP Error 403:forbidden