First, the HTTP exception handling problem.
When Urlopen is not able to handle a response, Urlerror is generated.
However, the usual Python APIs, such as Valueerror,typeerror, will also be generated at the same time.
Httperror is a subclass of Urlerror, usually generated in a specific HTTP URLs.
1.URLError
Typically, urlerror occurs when there is no network connection (no routing to a particular server), or if the server does not exist.
In this case, the exception will also have the "reason" attribute, which is a tuple (which can be understood as an immutable array).
Contains an error number and an error message.
Let's build a urllib2_test06.py to feel the unusual handling:
[Python] View plaincopy
- Import Urllib2
- req = Urllib2. Request (' http://www.baibai.com ')
- Try : Urllib2.urlopen (req)
- except Urllib2. Urlerror, E:
- Print E.reason
By pressing F5, you can see that the printed content is:
[Errno 11001] getaddrinfo failed
In other words, the error number is 11001 and the content is getaddrinfo failed
2.HTTPError
Each HTTP reply object on the server response contains a number "status code".
Sometimes the status code indicates that the server cannot complete the request. The default processor will handle a portion of this response for you.
For example, if response is a "redirect" that requires the client to obtain a document from another address, URLLIB2 will handle it for you.
Other can not handle, Urlopen will produce a httperror.
Typical errors include "404" (Page cannot be found), "403" (Request forbidden), and "401" (with authentication request).
The HTTP status code represents the status of the response returned by the HTTP protocol.
For example, the client sends a request to the server, and if it succeeds in obtaining the requested resource, the returned status code is 200, indicating a successful response.
If the requested resource does not exist, a 404 error is typically returned.
HTTP status codes are usually divided into 5 types, starting with a five-digit, 3-bit integer:
------------------------------------------------------------------------------------------------
200: Request Successful processing: Get the content of the response, processing
201: The request is complete, and the result is a new resource was created. The URI of the newly created resource can be processed in the response entity: The crawler will not encounter
202: The request is accepted, but processing has not completed processing: blocking wait
204: The server has implemented the request, but no new information is returned. If the customer is a user agent, you do not need to update your own document view for this. Processing mode: Discard
300: The status code is not used directly by the http/1.0 application, just as the default interpretation of the 3XX type response. There are multiple requested resources available. Processing mode: If the program can be processed, then further processing, if the program can not be processed, then discarded
301: The requested resource is assigned a permanent URL so that it can be accessed in the future through the URL: Redirect to the assigned URL
302: Requested resource is temporarily saved at a different URL processing mode: Redirect to temporary URL
304 The requested resource is not updated for processing: Discard
400 Illegal request processing mode: Discard
401 Unauthorized Handling: Discard
403 Prohibited Handling: Discard
404 No Processing found: Discard
5XX response code starting with "5" status code indicates that the server side found itself error, cannot continue to execute request processing mode: Discard
------------------------------------------------------------------------------------------------
The Httperror instance is generated with an integer ' code ' attribute, which is the associated error number sent by the server.
Error codes wrong code
Because the default processor handles redirects (300 + numbers), and a 100-299 range number indicates success, you can only see 400-599 of the error number.
BaseHTTPServer.BaseHTTPRequestHandler.response is a useful answer number dictionary that shows all the answer numbers used by the HTTP protocol.
When an error number is generated, the server returns an HTTP error number, and an error page.
You can use the Httperror instance as the Reply object response returned by the page.
This represents the same as the error property, which also contains the Read,geturl, and the info method.
Let's build a urllib2_test07.py to feel:
[Python] View plaincopy
- Import Urllib2
- req = Urllib2. Request (' http://bbs.csdn.net/callmewhy ')
- Try :
- Urllib2.urlopen (req)
- except Urllib2. Urlerror, E:
- Print E.code
- #print e.read ()
Press F5 to see the error code that was output 404, and said that the page could not be found.
3.Wrapping
So if you want to prepare for httperror or urlerror, there are two basic ways. The second type is recommended.
Let's build a urllib2_test08.py to demonstrate the first exception-handling scenario:
[Python] View plaincopy
- from Urllib2 Import Request, Urlopen, Urlerror, Httperror
- req = Request (' http://bbs.csdn.net/callmewhy ')
- Try :
- Response = Urlopen (req)
- except Httperror, E:
- Print ' the server couldn\ ' t fulfill the request. '
- Print ' Error code: ', E.code
- except Urlerror, E:
- Print ' We failed to reach a server. '
- Print ' Reason: ', E.reason
- Else :
- Print ' No exception was raised. '
- # everything is fine
Similar to other languages, a try catches an exception and prints its contents.
One thing to note here is that except httperror must be in the first, otherwise except Urlerror will also be accepted to Httperror .
Because Httperror is a subclass of Urlerror, if Urlerror is in front it will catch all urlerror (including Httperror).
Let's build a urllib2_test09.py to demonstrate the second exception handling scenario:
[Python] View plaincopy
- from Urllib2 Import Request, Urlopen, Urlerror, Httperror
- req = Request (' http://bbs.csdn.net/callmewhy ')
- Try :
- Response = Urlopen (req)
- except Urlerror, E:
- if hasattr (E, ' code '):
- Print ' the server couldn\ ' t fulfill the request. '
- Print ' Error code: ', E.code
- elif hasattr (E, ' reason '):
- Print ' We failed to reach a server. '
- Print ' Reason: ', E.reason
- Else :
- Print ' No exception was raised. '
- # everything is fine
The above describes the [Python] web crawler (iii): Exception handling and HTTP status code classification, including aspects of the content, I hope to be interested in PHP tutorial friends helpful.