Writing a web crawler in Python (iii): Handling of exceptions and classification of HTTP status codes

Source: Internet
Author: User
Tags exception handling resource in python

First of all, the HTTP exception handling problem.

When Urlopen cannot handle a response, a urlerror is generated.

However, the usual Python APIs such as Valueerror,typeerror can also be generated.

Httperror is a subclass of Urlerror that is typically generated in a specific HTTP URL.

1.URLError

Typically, urlerror occurs without a network connection (not routed to a specific server), or if the server does not exist.

In this case, the exception also has the "reason" attribute, which is a tuple (an array that can be understood as immutable),

Contains an error number and an error message.

Let's build a urllib2_test06.py to feel the exception handling:

Import urllib2  
      
req = urllib2. Request (' http://www.baibai.com ')  
      
try:urllib2.urlopen (req)  
      
except URLLIB2. Urlerror, E:    
    print E.reason

Press F5 to see what is printed:

[Errno 11001] getaddrinfo failed

In other words, the error number is 11001, the content is getaddrinfo failed

2.HTTPError

Each HTTP Reply object response on the server contains a number status code.

Sometimes the status code indicates that the server cannot complete the request. The default processor will handle part of this response for you.

For example, if response is a "redirect" that requires the client to obtain a document from another address, URLLIB2 will handle it for you.

Other urlopen that cannot be dealt with will produce a httperror.

Typical errors include "404" (pages cannot be found), "403" (Request Prohibition), and "401" (with authentication requests).

The HTTP status code represents the state of the response returned by the HTTP protocol.

For example, the client sends a request to the server, and if the requested resource is successfully obtained, the returned status code is 200, indicating a successful response.

If the requested resource does not exist, a 404 error is usually returned.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.