The Urllib2 module in Python

Source: Internet
Author: User
Tags http request header parameters object object urlencode

With Python's Urllib2 module, you can easily simulate the behavior of a user accessing a webpage.

This is a simple record of your own learning process.


First, Urlopen function


Urlopen (URL, data=none)--Basic usage is the same as original
Urllib. Pass the URL and optionally data to post to an HTTP URL, and
Get a File-like object back. One difference is so you can also pass
A Request instance instead of URL. Raises a urlerror (subclass of
IOERROR); For HTTP errors, raises a httperror, which can also be
Treated as a valid response.

Its basic usage is the same as the usage in the Urllib library. The notes for Urlopen in Urllib are as follows:


Urlopen (URL, data=none, Proxies=none)
Create a File-like object for the specified URL to read from.


But unlike Urllib, the first parameter URL of the Urlopen function in URLLIB2 can be a request instance.


1. Basic usage

Example:

Usage of the Urlopen function in the #等同urllib in [a]: Response = Urllib2.urlopen (' http://www.baidu.com ') in []: Response.read () # URLLIB2 Usage of the request instance in []: request = Urllib2. Request (' http://www.baidu.com ') in [+]: Response = Urllib2.urlopen (Request) in [+]: Response.read ()

I still like the second way of use here. After all, an HTTP request must be requested first, before the response can exist. In this way, the idea of programming is quite clear. The code reads clearly as well.


2. Analog POST Request

All of the above-simulated requests are all get-way requests, so what if we need to simulate a post-mode request?

Check the Help for request (URLLIB2. Request), its __init__ constructor is declared like this

__init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false)

From the declaration, the post data can be put into data, and we can also set the HTTP request header parameters via headers


Example:

Import Urllibimport Urllib2 values = {}values[' username '] = "God" values[' password '] = "XXXX" data = Urllib.urlencode (value s) # uses the UrlEncode method URL in the Urllib library = "Http://xxxx.xxxxx/login" request = Urllib2. Request (url,data) response = Urllib2.urlopen (request) print response.read ()


You can change your URLs, username and password for specific scenarios.


3. Set the HTTP request header

Try the headers parameter to modify some information about the HTTP request header. Make a slight change in the previous example

Import Urllibimport Urllib2 values = {}values[' username '] = "God" values[' password '] = "XXXX" data = Urllib.urlencode (value s) url = "Http://xxxx.xxxxx/login" headers = {' user-agent ': ' ozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) gecko/20100101 firefox/37.0 ', ' content-type ': ' text/html; Charset=utf-8 ', ' Referer ': ' http://www.baidu.com/'}request = urllib2. Request (url,data,headers) response = Urllib2.urlopen (request) print response.read ()


More header information can be found through the F12 function provided by the browser.


4. Set Request Timeout

Many times there are various reasons that may lead to your request for various waits. It's time to test your patience, but this can be done by setting a timeout in the urlopen to get rid of the long-time, unresponsive requests we can't tolerate.

Urlopen (URL, data=none, Timeout=<object object>)

One thing to note when using timeout is that if you do not have data, then you must show the pass parameters.

Example:

Import urllib2urllib2.urlopen (' http://www.baidu.com ', data,10) urllib2.urlopen (' http://www.baidu.com ', timeout=10)


Second, opener (Openerdirector)


The Openerdirector manages a collection of Handler objects that does
All the actual work. Each Handler implements a particular protocol or
Option. The Openerdirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HttpHandler performs HTTP GET and POST requests and deals with
Non-error returns. The Httpredirecthandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the Httpdigestauthhandler
Deals with Digest authentication


What do you use it for? Manages a series of handler objects. I understand that, in fact, when we use Urlopen, there is already a default handler. It's only transparent to us. We can use this handler to make get/post requests, but what if we want to do something else? If we want to set up an agent to do something and so on all non-get/post can handle it well. Then we need to change the handler. This is the time to use opener, which is what opener can do.


1, set up the agent

Import Urllib2proxy_handler = Urllib2. Proxyhandler ({"http": ' http://11.11.11.11:8080 '}) opener = Urllib2.build_opener (Proxy_handler) Urllib2.install_ Opener (opener) response = Urllib2.urlopen (' http://xxx.xxx.xxxx ') response.read ()


2. Open the Debug log function for HTTP and HTTPS

Import Urllib2httphandler = Urllib2. HttpHandler (debuglevel=1) Httpshandler = Urllib2. Httpshandler (debuglevel=1) opener = Urllib2.build_opener (HttpHandler, Httpshandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.baidu.com ')

3. Processing cookie information in conjunction with Cookielib

First of all, a simple look at cookielib this module, the function is very powerful. It's better to study carefully.

Here we only study opener related, temporarily skip Cookielib module

Import Urllib2import Cookielibcookie = Cookielib. Cookiejar () cookiehandler=urllib2. Httpcookieprocessor (cookie) opener = Urllib2.build_opener (Cookiehandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.baidu.com ') for item in cookie:print ' CookieName = ' +item.name print ' cookievalue = ' +it Em.value


Third, exception handling Urlerror and Httperror

Httperror is a sub-class of Urlerror


Urlerror
Httperror (Urlerror, Urllib.addinfourl)

Import Urllib2 req = Urllib2. Request (' Http://www.baidu.com/mmmaa ') try:urllib2.urlopen (req) except URLLIB2. Httperror, E:if hasattr (E, "code"): Print e.codeexcept urllib2. Urlerror, E:if hasattr (E, "Reason"): Print E.reasonelse:print "OK"


This article is from the "Learning Notes" blog, so be sure to keep this source http://unixman.blog.51cto.com/10163040/1654727

The Urllib2 module in Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.