Urlerror exception handling for the Python crawler

Source: Internet
Author: User
This section here is mainly about Urlerror and httperror, and some of the handling of them.

1.URLError

First explain the possible causes of Urlerror:

    • Network is not connected, that is, the computer cannot surf the internet
    • Cannot connect to a specific server
    • Server does not exist

In the code, we need to surround and catch the corresponding exception with the try-except statement. Here's an example of how it's going to feel

Import Urllib2 Requset = Urllib2. Request (' http://www.xxxxx.com ') Try:  Urllib2.urlopen (requset) except URLLIB2. Urlerror, E:  print E.reason

We used the Urlopen method to access a nonexistent URL, and the results were as follows:

[Errno 11004] getaddrinfo failed

It shows that the error code is 11004 and the error is getaddrinfo failed

2.HTTPError

Httperror is a subclass of Urlerror, and when you make a request using the Urlopen method, a Reply object response on the server, which contains a number "status code". For example, if response is a "redirect", you need to navigate to a different address to get the document, and URLLIB2 will handle it.

Other can not handle, Urlopen will produce a httperror, corresponding state, HTTP status code represents the status of the response returned by the HTTP protocol. The following state code is summed up as follows:

  • 100: Continue the client should continue to send the request. The client should continue to send the remainder of the request, or ignore the response if the request has been completed.
  • 101: Conversion protocol After sending the last empty line of this response, the server will switch to those protocols defined in the upgrade message header. Similar measures should be taken only when switching to a new protocol is more beneficial.
  • 102: Continue processing the status code extended by WebDAV (RFC 2518), on behalf of processing will be continued to execute.
  • 200: Request Successful processing: Get the content of the response, processing
  • 201: The request is complete, and the result is a new resource was created. The URI of the newly created resource can be processed in the response entity: The crawler will not encounter
  • 202: The request is accepted, but processing has not completed processing: blocking wait
  • 204: The server has implemented the request, but no new information is returned. If the customer is a user agent, you do not need to update your own document view for this. Processing mode: Discard
  • 300: The status code is not used directly by the http/1.0 application, just as the default interpretation of the 3XX type response. There are multiple requested resources available. Processing mode: If the program can be processed, then further processing, if the program can not be processed, then discarded
  • 301: The requested resource is assigned a permanent URL so that it can be accessed in the future through the URL: Redirect to the assigned URL
  • 302: Requested resource is temporarily saved at a different URL processing mode: Redirect to temporary URL
  • 304: Requested resource not updated processing mode: Discard
  • 400: Illegal request processing mode: Discard
  • 401: Unauthorized Handling: Discard
  • 403: Forbidden Handling: Discard
  • 404: No Processing found: Discard
  • 500: The server internal error server encountered an unexpected condition that caused it to be unable to complete the processing of the request. In general, this problem occurs when the source code on the server side is wrong.
  • 501: The server does not recognize that the server does not support a feature that is required for the current request. When the server does not recognize the requested method and cannot support its request for any resource.
  • 502: The error gateway receives an invalid response from the upstream server when it tries to execute the request as a gateway or as a proxy working server.
  • 503: Service error The server is currently unable to process the request due to temporary server maintenance or overloading. This situation is temporary and will be resumed after a certain period of time.

The Httperror instance is generated with a code property, which is the related error number sent by the server.
Because URLLIB2 can handle redirects for you, that is, the code that starts with 3 can be processed, and a 100-299 range number indicates success, so you can only see 400-599 of the error number.

Here we write an example to feel that the catch exception is Httperror, it will have a code property, is the error code, and we have printed the reason property, which is the property of its parent class Urlerror.

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Httperror, E:  print e.code  print E.reason

The operation results are as follows

403Forbidden

The error code is 403 and the error reason is forbidden, which indicates that the server is forbidden.

We know that the parent class of Httperror is Urlerror, according to the programming experience, the exception of the parent class should be written to the subclass exception, if the subclass is not caught, then can catch the exception of the parent class, so the above code can be so rewritten

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Httperror, E:  print e.codeexcept urllib2. Urlerror, E:  print E.reasonelse:  print "OK"

If Httperror is captured, the code is output and the Urlerror exception is not processed. If it is not httperror, it will catch the Urlerror exception and output the cause of the error.

In addition, you can add the Hasattr attribute in advance to judge the property, the code is rewritten as follows

Import Urllib2 req = Urllib2. Request (' Http://blog.csdn.net/cqcre ') Try:  Urllib2.urlopen (req) except URLLIB2. Urlerror, E:  if Hasattr (E, "code"):    print E.code  if Hasattr (E, "Reason"):    print E.reasonelse:  print "OK"

First, the attribute of the exception is judged to avoid the occurrence of an error in the attribute output.

Above, is the relevant introduction to Urlerror and Httperror, and the corresponding error handling method

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.