& Quot; HTTP Error 403: Forbidden & quot; in Python 3.x

Source: Internet
Author: User

Problem: urllib. request. the urlopen () method is often used to open the source code of a web page and analyze the source code of the page. However, when using this method for some websites, an "HTTP Error 403: forbidden "exception example: When the following statement is executed, [python] <span style =" font-size: 14px; "> urllib. request. urlopen ("http://blog.csdn.net/eric_sunah/article/details/11099295") </span> has the following exception: [python] <span style = "color: # FF0000;"> File "D: \ Python32 \ lib \ urllib \ request. py ", line 475, in open response = meth (req, response) File" D: \ Python32 \ lib \ urllib \ request. py ", line 587, in http_response 'http', request, response, code, msg, hdrs) File" D: \ Python32 \ lib \ urllib \ request. py ", line 513, in error return self. _ call_chain (* args) File "D: \ Python32 \ lib \ urllib \ request. py ", line 447, in _ call_chain result = func (* args) File" D: \ Python32 \ lib \ urllib \ request. py ", line 595, in http_error_default raise HTTPError (req. full_url, code, msg, hdrs, fp) urllib. error. HTTPError: HTTP Error 403: Forbidden </span> analysis: The above exception occurs because urllib is used. request. when urlopen is used to open a URL, the server will only receive a simple request for access to the page, but the server does not know the browser, operating system, hardware platform and other information used to send the request, requests without such information are usually abnormal, such as crawlers. some websites verify the UserAgent in the request information to prevent such abnormal access (its information includes hardware platform, system software, application software and users' personal preferences ), if the UserAgent has an exception or does not exist, the request will be rejected (as shown in the preceding error message). Therefore, you can try to add UserAgent to the request. x, it is very easy to add UserAgent information to the request. The Code is as follows [python] # If the following line is not added, urllib2.HTTPError: HTTP Error 403 will appear: forbidden Error # mainly because the website prohibits crawlers. You can add header information in the request to disguise it as a browser access to the User-Agent, for specific information, you can use Firefox's FireBug plugin to query headers = {'user-agent': 'mozilla/5.0 (Windows NT 6.1; WOW64; rv: 23.0) gecko/20100101 Firefox/23.0 '} req = urllib. request. request (url = chaper_url, headers = headers) urllib. request. urlopen (req ). read () to urllib. request. urlopen. after you replace read () with the above code, you can access the problematic page normally.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.