Urlerror exception handling of Python crawler

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.URLError

The reasons for the Urlerror:

(1) No connection to the network, that is, the computer can not access

(2) Not connected to a specific server

(3) server does not exist

Import urllib.request
import urllib.error from
urllib.request import Urlopen
request= Urllib.request.Request (' http://www.xxxxxx.com ')
try:
    urllib.request.urlopen (request)
except Urllib.error.URLError as E:
    print (E.reason)

D:\Anaconda3\python.exe d:/dazhongdianping/position.py
[Errno 11004] getaddrinfo failed

The run result indicates that the error band is still 11004 and the error is due to getaddrinfo failed

Errors generated during the writing process: first to import the library file, and then must pay attention to the difference between Python2 and 3

1.HTTPError

Httperror is a subclass of Urlerror, when you make a request using the Urlopen method, a Response object response on the server, where he contains a number "status code", for example, response is a redirect and needs to navigate to another address to get the document , Urllib will handle this.

Other cannot handle, Urlopen will produce a httperror corresponding status code, the HTTP status code represents the state of the response returned by the HTTP protocol. The status code boils down to the following:

100: Continue the client should continue to send the request. The client should continue to send the remainder of the request, or ignore the response if the request has been completed.

101: The conversion protocol after sending the last blank line of this response, the server will switch to the protocols defined in the upgrade message header. Similar measures should be taken only when it is more beneficial to switch new protocols.

102: Continue to process the status code extended by WebDAV (RFC 2518), and the representative processing will continue to execute.

200: Request Successful processing mode: Get the content of the response, processing

201: The request completes, and the result is the creation of a new resource. The URI of the newly created resource can be handled in the responding entity: The reptile will not encounter

202: The request is received, but the processing has not yet completed processing mode: blocking waiting

204: The server side has implemented the request, but no new information has been returned. If the customer is a user agent, you do not need to update your own document view for this. Treatment method: Discard

300: The status code is not used directly by the http/1.0 application, but as the default interpretation of the 3XX type response. There are multiple available requested resources. Processing method: If the program can be processed, then further processing, if the program can not handle, discard
301: The requested resource will be assigned a permanent URL so that it can be accessed through this URL in the future: Redirect to the assigned URL

302: The requested resource is temporarily saved at a different URL: Redirect to temporary URL

304: The requested resource was not updated processing: discarded

400: Illegal request processing way: Discard

401: Unauthorized Processing: Discard

403: Prohibited Treatment: Discard

404: No way to find the treatment: Discard

500: The server internal error server encountered an unexpected condition that caused it to fail to complete processing of the request. Generally, this problem occurs when the source code on the server side is wrong.

501: The server does not recognize a feature required by the server that does not support the current request. When the server does not recognize the requested method and cannot support its request for any resources.

502: The error gateway receives an invalid response from the upstream server when it attempts to execute the request as a gateway or proxy-working server. 503: Service error due to temporary server maintenance or overload, the server is currently unable to process the request. This condition is temporary and will be restored after a period of time
The Httperror instance is generated with a code attribute, which is the associated error number sent by the server. Because Urllib can handle redirects for you, that is, the 3-letter code can be processed, and the 100-299-range number indicates success, so you can only see 400-599 of the error number.

Let's write an example to feel that the catch exception is Httperror, it comes with a code attribute, the error code, and we print the reason attribute, which is the property of its parent class Urlerror.

Import urllib.request
import urllib.error from
urllib.request import Urlopen
request= Urllib.request.Request (' http://blog.csdn.net/cqcre ')
try:
    urllib.request.urlopen (request)
except Urllib.error.HTTPError as E:
    print (E.code)
    print (E.reason)

D:\Anaconda3\python.exe d:/dazhongdianping/position.py
403
Forbidden

Result Analysis: The error code number is 403, the error reason is forbidden, stating that the server is forbidden to access.

We know: Httperror's parent class is urlerror, and according to programming experience, Fred's exception should be written after the subclass exception, and if the subclass is not captured, the parent class's exception can be captured.

Import urllib.request
import urllib.error from
urllib.request import Urlopen
request= Urllib.request.Request (' http://blog.csdn.net/cqcre ')
try:
    urllib.request.urlopen (request)
except Urllib.error.HTTPError as E:
    print (e.code)
except Urllib.error.URLError as E:
    print (E.reason)
else:
    print (' OK ')

Results:
403

If the replenishment to the Httperror, then output code, no longer deal with urlerror exception, if the occurrence is not httperror, it will go to catch Urlerror exception, output error reasons.

In addition, hasattr attributes can be added to the attribute in advance to determine the code rewrite as follows:

Import urllib.request
import urllib.error from
urllib.request import Urlopen
request= Urllib.request.Request (' http://blog.csdn.net/cqcre ')
try:
    urllib.request.urlopen (request)
except Urllib.error.URLError as E:
    if Hasattr (E, ' reason '):
        print (E.reason)
else:
    print (' OK ')

First of all, to judge the exception to avoid the phenomenon of attribute output error

Part of the content reproduced from: Static search»python crawler five Urlerror exception handling

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Urlerror exception handling of Python crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Urlerror exception handling of Python crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support