Urlerror
Urlerror Cause:
- Network is not connected (ie cannot surf the internet)
- Server does not exist
We generally enclose and catch the corresponding exception through the try-except statement. Let's try it first:
1 Import Urllib2 2 request=urllib2. Request ('http://www.wujiadong.com')3try: Urllib2.urlopen (Request)4except URLLIB2. Urlerror,e:5 print (E.reason)
Httperror
When you make a request using the Urlopen method, a Reply object response on the server, which contains
A number "status code. For example, if response is a "redirect" that requires the client to obtain the document from another address,
URLLIB2 will handle it for you. Other can not handle, Urlopen will produce a httperror.
Typical errors include "404" (Page cannot be found), "403" (Request forbidden), and "401" (with authentication request).
The HTTP status code represents the status of the response returned by the HTTP protocol.
For example, the client sends a request to the server, and if it succeeds in obtaining the requested resource, the returned status code is 200, indicating a successful response.
If the requested resource does not exist, a 404 error is typically returned.
HTTP status codes are usually divided into 5 types, starting with a five-digit, 3-bit integer:
1 -: The continuation client should continue to send the request. The client should continue to send the remainder of the request, or ignore the response if the request has been completed. 2 3 101: After sending the last empty line of the response, the server switches to the protocols defined in the upgrade message header. Similar measures should be taken only when switching to a new protocol is more beneficial. 4 5 102: Continue processing by WebDAV (RFC2518The extended status code, on behalf of processing, will continue to execute. 6 7 $: Request Successful processing: Get the content of the response and process it8 9 201: The request was completed and the result was a new resource was created. The URI of the newly created resource can be processed in the response entity: The crawler will not encounterTen One 202: The request is accepted, but processing has not completed processing: blocking wait A - 204: The server has implemented the request, but no new information is returned. If the customer is a user agent, you do not need to update your own document view for this. Processing mode: Discard - the -: The status code is not http/1. 0 applications are used directly, just as the default interpretation of the 3XX type response. There are multiple requested resources available. Processing mode: If the program can be processed, then further processing, if the program can not be processed, then discarded - 301: The requested resource is assigned a permanent URL so that it can be accessed in the future through the URL: Redirect to the assigned URL - - 302: The requested resource is temporarily saved at a different URL to be processed: Redirect to a temporary URL + - 304: The requested resource is not updated for processing: Discard + A -: Illegal request processing mode: Discard at - 401: Unauthorized handling: Discard - - 403: Prohibit processing: Discard - - 404: No processing found: Discard in - -: The server Internal error server encountered an unexpected condition that prevented it from completing the processing of the request. In general, this problem occurs when the source code on the server side is wrong. to + 501: The server does not recognize a feature that is required by the server to support the current request. When the server does not recognize the requested method and cannot support its request for any resource. - the 502: An invalid response was received from the upstream server when an error gateway was attempted to execute a request as a gateway or as a proxy working server. * $ 503: Service error due to temporary server maintenance or overloading, the server is currently unable to process the request. This situation is temporary and will be resumed after a certain period of time.
The Httperror instance is generated with an integer ' code ' attribute, which is the associated error number sent by the server. error codes wrong code because the default processor handles redirects (300 + numbers), and a 100-299 range number indicates success, you can only see 400-599 of the error number.
1 Import Urllib2 2 request=urllib2. Request ('http://bbs.csdn.net/callmewhy')3try: Urllib2.urlopen (Request)4except URLLIB2. Urlerror,e:5 print (e.code)6# print (E.reason)7 # Print (E.read ())
The error code is 403 and the error reason is forbidden, which indicates that the server is forbidden.
Method One: Join the Hasattr property to judge the attribute in advance to handle the exception
1 fromurllib2 Import Request,urlopen,urlerror,httperror2Request=request ('Http://blog.csdn.net/cqcre')3 Try:4Response=Urlopen (Request)5 6 except Urlerror,e:7 8 ifHasattr (E,'Code'):9Print'The server couldn\ ' t fulfill the request')TenPrint'Error Code:', E.code) OneElif Hasattr (E,'reason'): APrint'we failed to reach a server') -Print'Reason:', E.reason) - Else: thePrint'no exception was raised') -# everything isOk
Method Two:
This blog post is a learning crawler note, which is basically from the following two blog posts, a small number of changes:
http://blog.csdn.net/pleasecallmewhy/article/details/8923725
Http://cuiqingcai.com/961.html
Python Notes-Crawler 3