Summary of the usage of the urllib2 module in Python network programming

Last Update:2017-05-14 Source: Internet

Author: User

Tags error status code

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

You may be familiar with url-based HTTP requests and other operations using the urllib2 module. here, we will take a closer look at urllib2's HTTP exception handling functions, let's take a look at the usage summary of the urllib2 module in Python network programming: I. basic applications

import urllib2url = r'http://www.baidu.com'html = urllib2.urlopen(url).read()print html

The client communicates with the server through request and response. the client first sends a request to the server and then receives the response returned by the server.

Urllib2 provides a request class that allows you to construct a request object before sending a request, and then send the request through the urllib2.urlopen method.

import urllib2url = r'http://www.baidu.com'req = urllib2.Request(url)html = urllib2.urlopen(req).read()print html

In the preceding example

req = urllib2.Request(url)

Instantiate a resquest object, and then use

urllib2.urlopen(req)

To open this webpage.

We noticed that when the Request object is instantiated, the queue url is required, and there are several default parameters.

Data and header are also used in the base. These two parameters are often required for websites that require logon to browse.

import urllib import urllib2  url = 'http://www.baidu.com/' values = {'name' : 'Michael Foord', 'location' : 'Northampton','language' : 'Python' } data = urllib.urlencode(values) req = urllib2.Request(url,data) response = urllib2.urlopen(req) the_page = response.read()print the_page

This example is to send several pieces of data to Baidu. In this example, an error page is returned, which is normal, because we do not need to post any information when accessing Baidu, but it will be wrong if we post it.

If Baidu cannot find the corresponding webpage, an error will be reported.

Of course, this is POST data and can also be used in the GET method to slightly transform the above code

Is Baidu through http://www.baidu.com/s? Wd = XXX to query, so we need to urlencode the {'wd ': 'XXX'} Dictionary.

# Coding: utf-8import urllib import urllib2 url = 'http: // www.baidu.com/s' values = {'wd ': 'Yang yanxing'} data = urllib. urlencode (values) print data url2 = url + '? '+ Dataresponse = urllib2.urlopen (url2) the_page = response. read () print the_page

The following example describes how to use cookies by simulating logon to Renren and then displaying the homepage content. The following is an example in this document. we will transform this example to implement the functions we want.

Import cookielib, urllib2cj = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) r = opener. open ("http://example.com/") # coding: utf-8import urllib2, urllibimport cookieliburl = r'http: // www.renren.com/ajaxlogin'recipe creates a cjcookiecontainer CJ = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) # encode data = urllib for the data to be POST. urlencode ({"email": email, "password": pass}) r = opener. open (url, data) print cj

When you see cj, it means you have accessed the login page, whether it is normal to log on you still can not see, you can access the http://www.renren.com/home to view

The above code has two points to explain. I also learned it for a long time.

r = opener.open(url,data)

Why should we use the opener object for open instead of utllib2 and urlopen? We can use urllib2.urlopen through transformation. opener is created by urllib2.bulid _ opener, but you can understand it in this way, after being built, he has not installed and used it, nor has its attributes and methods. if you want to make urllib2 have opener attributes and methods, you can first use urllib2.install _ opener (opener) to "install" the opener. after installation, you can use urllib2 to perform operations.

# Coding: utf-8import urllib2, urllibimport cookieliburl = r 'http: // www.renren.com/ajaxlogin'recipe creates a cjcookiecontainer CJ = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) urllib2.install _ opener (opener) # encode the data to be POST data = urllib. urlencode ({"email": email, "password": pass}) # r = opener. open (url, data) without the urllib2.install _ opener method above, you must write r = urllib2.urlopen (url, data) html = urllib2.urlopen ('http: // www.renren.com/home '). read () print html

Similarly, urllib2 also has proxy-related handle. The basic idea is similar to this.

II. Exception handling

When urlopen () cannot handle the response, the URLError exception occurs. An HTTPError exception is a subclass of URLError. it is caused only when an http url is accessed.

1. URLError exception

Generally, the cause of URLError is that there is no network connection (no route to the target server) and the target server to be accessed does not exist. In this case, the exception object will have a reason attribute (a tuples (error code, error cause ).

#! /usr/bin/env python#coding=utf-8import urllib2url="http://www.baidu.com/"try: response=urllib2.urlopen(url)except urllib2.URLError,e: print e.reason

2. HTTPError
Each HTTP response returned from the server has a status code. Some status codes indicate that the server cannot complete the corresponding request, and the default processing program can process such status codes for us (for example, the returned response is redirection, urllib2 automatically retrieves information from the redirected page ). Some status codes, which the urllib2 module cannot help us with, then the urlopen function will cause an HTTPError exception, typically 404/401.
An abnormal HTTPError instance has an integer code attribute, indicating the error status code returned by the server.
The default handler of the urllib2 module can process redirection (the status code is in the range of 300), and the status code is within the range of-, indicating success. Therefore, the range of status codes that can cause an HTTPError exception is 400-599.
When an error occurs, the server returns the HTTP error code and error page. You can use the HTPError instance as the return page, which means that the HTTPError instance not only has the code attribute, but also has methods such as read, geturl, and info.

#! /usr/bin/env python#coding=utf-8import urllib2url="http://cs.scu.edu.cn/~duanlei"try: response=urllib2.urlopen(url)except urllib2.HTTPError,e: print e.code print e.read()

3. Summary
If you want to process URLError and HTTPError in the code, the code is as follows:

#! /Usr/bin/env python # coding = utf-8import urllib2url = "xxxxxx" # URLtry: response = urllib2.urlopen (url) need T urllib2.HTTPError, e: # HTTPError must be placed before URLError print "The server couldn't fulfill the request" print "Error code:", e. code print "Return content:", e. read () Failed t urllib2.URLError, e: print "Failed to reach the server" print "The reason:", e. reasonelse: # something you shoshould do pass # handle other exceptions #! /Usr/bin/env python # coding = utf-8import urllib2url = "http: // xxx" # URLtry: response = urllib2.urlopen (url) cannot urllib2.URLError, e: if hasattr (e, "reason"): print "Failed to reach the server" print "The reason:", e. reason elif hasattr (e, "code"): print "The server couldn't fulfill the request" print "Error code:", e. code print "Return content:", e. read () else: pass # handle other exceptions

In comparison, the second exception handling method is better.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More