You may be familiar with url-based HTTP requests and other operations using the urllib2 module. here, we will take a closer look at urllib2's HTTP exception handling functions, let's take a look at the usage summary of the urllib2 module in Python network programming:
I. basic applications
import urllib2url = r'http://www.baidu.com'html = urllib2.urlopen(url).read()print html
The client communicates with the server through request and response. the client first sends a request to the server and then receives the response returned by the server.
Urllib2 provides a request class that allows you to construct a request object before sending a request, and then send the request through the urllib2.urlopen method.
import urllib2url = r'http://www.baidu.com'req = urllib2.Request(url)html = urllib2.urlopen(req).read()print html
In the preceding example
req = urllib2.Request(url)
Instantiate a resquest object, and then use
urllib2.urlopen(req)
To open this webpage.
We noticed that when the Request object is instantiated, the queue url is required, and there are several default parameters.
Data and header are also used in the base. These two parameters are often required for websites that require logon to browse.
import urllib import urllib2 url = 'http://www.baidu.com/' values = {'name' : 'Michael Foord', 'location' : 'Northampton','language' : 'Python' } data = urllib.urlencode(values) req = urllib2.Request(url,data) response = urllib2.urlopen(req) the_page = response.read()print the_page
This example is to send several pieces of data to Baidu. In this example, an error page is returned, which is normal, because we do not need to post any information when accessing Baidu, but it will be wrong if we post it.
If Baidu cannot find the corresponding webpage, an error will be reported.
Of course, this is POST data and can also be used in the GET method to slightly transform the above code
Is Baidu through http://www.baidu.com/s? Wd = XXX to query, so we need to urlencode the {'wd ': 'XXX'} Dictionary.
# Coding: utf-8import urllib import urllib2 url = 'http: // www.baidu.com/s' values = {'wd ': 'Yang yanxing'} data = urllib. urlencode (values) print data url2 = url + '? '+ Dataresponse = urllib2.urlopen (url2) the_page = response. read () print the_page
The following example describes how to use cookies by simulating logon to Renren and then displaying the homepage content. The following is an example in this document. we will transform this example to implement the functions we want.
Import cookielib, urllib2cj = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) r = opener. open ("http://example.com/") # coding: utf-8import urllib2, urllibimport cookieliburl = r'http: // www.renren.com/ajaxlogin'recipe creates a cjcookiecontainer CJ = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) # encode data = urllib for the data to be POST. urlencode ({"email": email, "password": pass}) r = opener. open (url, data) print cj
When you see cj, it means you have accessed the login page, whether it is normal to log on you still can not see, you can access the http://www.renren.com/home to view
The above code has two points to explain. I also learned it for a long time.
r = opener.open(url,data)
Why should we use the opener object for open instead of utllib2 and urlopen? We can use urllib2.urlopen through transformation. opener is created by urllib2.bulid _ opener, but you can understand it in this way, after being built, he has not installed and used it, nor has its attributes and methods. if you want to make urllib2 have opener attributes and methods, you can first use urllib2.install _ opener (opener) to "install" the opener. after installation, you can use urllib2 to perform operations.
# Coding: utf-8import urllib2, urllibimport cookieliburl = r 'http: // www.renren.com/ajaxlogin'recipe creates a cjcookiecontainer CJ = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) urllib2.install _ opener (opener) # encode the data to be POST data = urllib. urlencode ({"email": email, "password": pass}) # r = opener. open (url, data) without the urllib2.install _ opener method above, you must write r = urllib2.urlopen (url, data) html = urllib2.urlopen ('http: // www.renren.com/home '). read () print html
Similarly, urllib2 also has proxy-related handle. The basic idea is similar to this.
II. Exception handling
When urlopen () cannot handle the response, the URLError exception occurs. An HTTPError exception is a subclass of URLError. it is caused only when an http url is accessed.
1. URLError exception
Generally, the cause of URLError is that there is no network connection (no route to the target server) and the target server to be accessed does not exist. In this case, the exception object will have a reason attribute (a tuples (error code, error cause ).
#! /usr/bin/env python#coding=utf-8import urllib2url="http://www.baidu.com/"try: response=urllib2.urlopen(url)except urllib2.URLError,e: print e.reason
2. HTTPError
Each HTTP response returned from the server has a status code. Some status codes indicate that the server cannot complete the corresponding request, and the default processing program can process such status codes for us (for example, the returned response is redirection, urllib2 automatically retrieves information from the redirected page ). Some status codes, which the urllib2 module cannot help us with, then the urlopen function will cause an HTTPError exception, typically 404/401.
An abnormal HTTPError instance has an integer code attribute, indicating the error status code returned by the server.
The default handler of the urllib2 module can process redirection (the status code is in the range of 300), and the status code is within the range of-, indicating success. Therefore, the range of status codes that can cause an HTTPError exception is 400-599.
When an error occurs, the server returns the HTTP error code and error page. You can use the HTPError instance as the return page, which means that the HTTPError instance not only has the code attribute, but also has methods such as read, geturl, and info.
#! /usr/bin/env python#coding=utf-8import urllib2url="http://cs.scu.edu.cn/~duanlei"try: response=urllib2.urlopen(url)except urllib2.HTTPError,e: print e.code print e.read()
3. Summary
If you want to process URLError and HTTPError in the code, the code is as follows:
#! /Usr/bin/env python # coding = utf-8import urllib2url = "xxxxxx" # URLtry: response = urllib2.urlopen (url) need T urllib2.HTTPError, e: # HTTPError must be placed before URLError print "The server couldn't fulfill the request" print "Error code:", e. code print "Return content:", e. read () Failed t urllib2.URLError, e: print "Failed to reach the server" print "The reason:", e. reasonelse: # something you shoshould do pass # handle other exceptions #! /Usr/bin/env python # coding = utf-8import urllib2url = "http: // xxx" # URLtry: response = urllib2.urlopen (url) cannot urllib2.URLError, e: if hasattr (e, "reason"): print "Failed to reach the server" print "The reason:", e. reason elif hasattr (e, "code"): print "The server couldn't fulfill the request" print "Error code:", e. code print "Return content:", e. read () else: pass # handle other exceptions
In comparison, the second exception handling method is better.