Python 中的 urllib2 模組

最後更新：2015-05-24 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：python urllib2 openurl opener timeout

通過python 的 urllib2 模組，可以輕易的去類比使用者訪問網頁的行為。

這裡將自己的學習過程簡單的記錄下來。

一、urlopen函數

    urlopen(url, data=None) -- Basic usage is the same as original
    urllib. pass the url and optionally data to post to an HTTP URL, and
    get a file-like object back. One difference is that you can also pass
    a Request instance instead of URL. Raises a URLError (subclass of
    IOError); for HTTP errors, raises an HTTPError, which can also be
    treated as a valid response.

它的基本用法同urllib 庫中的用法是一樣的。urllib 中的urlopen 的注釋如下：

urlopen(url, data=None, proxies=None)
Create a file-like object for the specified URL to read from.

但不同於urllib 的是，urllib2 中的urlopen函數的第一個參數url 可以是一個Request 執行個體。

1、基本用法

Example:

#等同urllib 中的urlopen 函數的用法In [12]: response = urllib2.urlopen(‘http://www.baidu.com‘)In [13]: response.read()# urllib2 中的使用request 執行個體的用法In [14]: request = urllib2.Request(‘http://www.baidu.com‘)In [15]: response = urllib2.urlopen(request)In [16]: response.read()

我在這裡還是非常喜歡第二種使用方式。畢竟一個http 的請求首先要有request，然後才能存在response。這樣在編程的思路上就比較明了了。代碼閱讀起來也很清晰。

2、類比POST請求

以上所類比的請求，全部都是GET方式的請求，那如果需要類比POST方式的請求呢？

查看Request的協助help(urllib2.Request) 中發現，它的__init__ 建構函式是這樣聲明的

__init__(self, url, data=None, headers={}, origin_req_host=None, unverifiable=False)

從聲明上來看POST 的資料可以放到data 中，且我們還可以通過headers 設定http的要求標頭參數

Example:

import urllibimport urllib2 values = {}values[‘username‘] = "God"values[‘password‘] = "XXXX"data = urllib.urlencode(values)  # 使用了urllib庫中的urlencode方法url = "http://xxxx.xxxxx/login"request = urllib2.Request(url,data)response = urllib2.urlopen(request)print response.read()

大家可以針對具體的情境去更換自己的url、username 和 password

3、設定HTTP要求標頭

再通過headers參數去嘗試一下修改http 要求標頭的一些資訊。在上一個例子中進行稍微的修改

import urllibimport urllib2 values = {}values[‘username‘] = "God"values[‘password‘] = "XXXX"data = urllib.urlencode(values) url = "http://xxxx.xxxxx/login"headers = {‘User-Agent‘:‘ozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0‘,‘Content-Type‘:‘text/html; charset=utf-8‘,‘Referer‘:‘http://www.baidu.com/‘}request = urllib2.Request(url,data,headers)response = urllib2.urlopen(request)print response.read()

可以通過瀏覽器提供的F12功能去找到更多的頭資訊。

4、佈建要求逾時

好多時候各種原因，有可能導致你的請求各種等待。考驗耐心的時候到了，不過這時可用通過設定urlopen 中的逾時去幹掉那些我們無法容忍的長時間沒法響應的請求。

urlopen(url, data=None, timeout=<object object>)

使用timeout 的時候要注意的一點是，如果你沒有data資料，那麼這時你一定要顯示的傳遞參數。

Example:

import urllib2urllib2.urlopen(‘http://www.baidu.com‘,data,10)urllib2.urlopen(‘http://www.baidu.com‘,timeout=10)

二、opener(OpenerDirector)

    The OpenerDirector manages a collection of Handler objects that do
    all the actual work. Each Handler implements a particular protocol or
    option. The OpenerDirector is a composite object that invokes the
    Handlers needed to open the requested URL. For example, the
    HTTPHandler performs HTTP GET and POST requests and deals with
    non-error returns. The HTTPRedirectHandler automatically deals with
    HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
    deals with digest authentication

幹嘛用的? 管理了一系列的handler 對象。我這這麼理解的，其實我們在使用urlopen 的時候就已經存在了一個預設的handler 。只是對我們時透明的。我們可以使用這個handler做GET/POST 請求，但是如果我們想做一些其他的事情呢？如我們想設定代理去做一些事情等所有非GET/POST能處理好的。那麼我們就需要更換handler了。這時就要使用opener ，這就時opener 所能乾的。

1、設定代理

import urllib2proxy_handler = urllib2.ProxyHandler({"http" : ‘http://11.11.11.11:8080‘})opener = urllib2.build_opener(proxy_handler)urllib2.install_opener(opener)response = urllib2.urlopen(‘http://xxx.xxx.xxxx‘)response.read()

2、開啟http 和 https 的 Debug log 功能

import urllib2httpHandler = urllib2.HTTPHandler(debuglevel=1)httpsHandler = urllib2.HTTPSHandler(debuglevel=1)opener = urllib2.build_opener(httpHandler, httpsHandler)urllib2.install_opener(opener)response = urllib2.urlopen(‘http://www.baidu.com‘)

3、結合cookielib 處理 cookie 資訊

首先要簡單的瞭解一下cookielib 這個模組，功能還是很強大的。最好仔細研究一下

這裡我們只研究 opener 相關，暫時略過cookielib 模組

import urllib2import cookielibcookie = cookielib.CookieJar()cookieHandler=urllib2.HTTPCookieProcessor(cookie)opener = urllib2.build_opener(cookieHandler)urllib2.install_opener(opener)response = urllib2.urlopen(‘http://www.baidu.com‘)for item in cookie:    print ‘CookieName = ‘+item.name    print ‘CookieValue = ‘+item.value

三、異常處理URLError 和 HTTPError

HTTPError 是 URLError 的一個子類

URLError
HTTPError(URLError, urllib.addinfourl)

import urllib2 req = urllib2.Request(‘http://www.baidu.com/mmmaa‘)try:    urllib2.urlopen(req)except urllib2.HTTPError, e:    if hasattr(e,"code"):        print e.codeexcept urllib2.URLError, e:    if hasattr(e,"reason"):        print e.reasonelse:    print "OK"

本文出自 “學習筆記” 部落格，請務必保留此出處http://unixman.blog.51cto.com/10163040/1654727

Python 中的 urllib2 模組

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More