Transported from http://www.2cto.com/kf/201309/242273.html, thanks to the original.
The exception above is because if you open a URL in urllib.request.urlopen mode, the server will only receive a simple request for access to that page.
However, the server is not aware of sending this request to use the browser, operating system, hardware platform and other information, and the lack of such information is often a request for non-normal access, such as crawlers.
In order to prevent this unusual access, some websites verify the useragent in the requested information (its information includes hardware platforms, system software, application software, and user preferences).
If useragent exists or does not exist, then this request will be rejected.
A viable solution is to include useragent information in the request.
Here is an example of success:
Url='testurl' #用真实的URL替代TestURLHEADERS={'user-agent' :'mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) gecko/20100101 firefox/36.0'}req=urllib2. Request (Url=url,headers=headers)
Htmlcode=urllib2.urlopen (req). Read ()
[Python] Urllib2. Httperror:http Error 403:forbidden