Urlerror exception handling for the Python crawler

Last Update:2016-06-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section here is mainly about Urlerror and httperror, and some of the handling of them.

1.URLError

First explain the possible causes of Urlerror:

Network is not connected, that is, the computer cannot surf the internet
Cannot connect to a specific server
Server does not exist

In the code, we need to surround and catch the corresponding exception with the try-except statement. Here's an example of how it's going to feel

Import Urllib2 Requset = Urllib2. Request (' http://www.xxxxx.com ') Try:  Urllib2.urlopen (requset) except URLLIB2. Urlerror, E:  print E.reason

We used the Urlopen method to access a nonexistent URL, and the results were as follows:

[Errno 11004] getaddrinfo failed

It shows that the error code is 11004 and the error is getaddrinfo failed

2.HTTPError

Httperror is a subclass of Urlerror, and when you make a request using the Urlopen method, a Reply object response on the server, which contains a number "status code". For example, if response is a "redirect", you need to navigate to a different address to get the document, and URLLIB2 will handle it.

Other can not handle, Urlopen will produce a httperror, corresponding state, HTTP status code represents the status of the response returned by the HTTP protocol. The following state code is summed up as follows:

100: Continue the client should continue to send the request. The client should continue to send the remainder of the request, or ignore the response if the request has been completed.
101: Conversion protocol After sending the last empty line of this response, the server will switch to those protocols defined in the upgrade message header. Similar measures should be taken only when switching to a new protocol is more beneficial.
102: Continue processing the status code extended by WebDAV (RFC 2518), on behalf of processing will be continued to execute.
200: Request Successful processing: Get the content of the response, processing
201: The request is complete, and the result is a new resource was created. The URI of the newly created resource can be processed in the response entity: The crawler will not encounter
202: The request is accepted, but processing has not completed processing: blocking wait
204: The server has implemented the request, but no new information is returned. If the customer is a user agent, you do not need to update your own document view for this. Processing mode: Discard
300: The status code is not used directly by the http/1.0 application, just as the default interpretation of the 3XX type response. There are multiple requested resources available. Processing mode: If the program can be processed, then further processing, if the program can not be processed, then discarded
301: The requested resource is assigned a permanent URL so that it can be accessed in the future through the URL: Redirect to the assigned URL
302: Requested resource is temporarily saved at a different URL processing mode: Redirect to temporary URL
304: Requested resource not updated processing mode: Discard
400: Illegal request processing mode: Discard
401: Unauthorized Handling: Discard
403: Forbidden Handling: Discard
404: No Processing found: Discard
500: The server internal error server encountered an unexpected condition that caused it to be unable to complete the processing of the request. In general, this problem occurs when the source code on the server side is wrong.
501: The server does not recognize that the server does not support a feature that is required for the current request. When the server does not recognize the requested method and cannot support its request for any resource.
502: The error gateway receives an invalid response from the upstream server when it tries to execute the request as a gateway or as a proxy working server.
503: Service error The server is currently unable to process the request due to temporary server maintenance or overloading. This situation is temporary and will be resumed after a certain period of time.

The Httperror instance is generated with a code property, which is the related error number sent by the server.
Because URLLIB2 can handle redirects for you, that is, the code that starts with 3 can be processed, and a 100-299 range number indicates success, so you can only see 400-599 of the error number.

Here we write an example to feel that the catch exception is Httperror, it will have a code property, is the error code, and we have printed the reason property, which is the property of its parent class Urlerror.

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Httperror, E:  print e.code  print E.reason

The operation results are as follows

403Forbidden

The error code is 403 and the error reason is forbidden, which indicates that the server is forbidden.

We know that the parent class of Httperror is Urlerror, according to the programming experience, the exception of the parent class should be written to the subclass exception, if the subclass is not caught, then can catch the exception of the parent class, so the above code can be so rewritten

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Httperror, E:  print e.codeexcept urllib2. Urlerror, E:  print E.reasonelse:  print "OK"

If Httperror is captured, the code is output and the Urlerror exception is not processed. If it is not httperror, it will catch the Urlerror exception and output the cause of the error.

In addition, you can add the Hasattr attribute in advance to judge the property, the code is rewritten as follows

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Urlerror, E:  if Hasattr (E, "code"):    print E.code  if Hasattr (E, "Reason"):    print E.reasonelse:  print "OK"

First, the attribute of the exception is judged to avoid the occurrence of an error in the attribute output.

Above, is the relevant introduction to Urlerror and Httperror, and the corresponding error handling method



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Urlerror exception handling for the Python crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Urlerror exception handling for the Python crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support