人生苦短之Python的urllib urllib2 requests

最後更新：2017-09-25 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：out 區分操作 content mod toc app receive lis

在Python中涉及到URL請求相關的操作涉及到模組有urllib,urllib2,requests,其中urllib和urllib2是Python內建的HTTP訪問標準庫,requsets是第三方庫,需要自行安裝。requests是第三方庫,可以想到在使用起來它可能是最方便的一個。

urllib和urllib2

urllib和urllib2模組都是跟url請求相關的,但是提供的功能是不同的。我們常用的urllib2的請求方式:

response = urllib2.urlopen(‘http://www.baidu.com‘)

在參數中可以傳入url和request對象,傳入request可以來設定URL請求的headers,可以偽裝成瀏覽器(當請求的網站進行請求監測的時候),urllib是只能傳入url的,這也是二者的差別之一:

user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘

request = urllib2.Request(url, headers={    ‘User-Agent‘: user_agent})response = urllib2.urlopen(request)

但是在urllib中一些方法是沒有加入的urllib2當中的,在有些時候也是需要urllib的輔助,這些我還暫時不是很懂,等遇到的時候再深究下,例如涉及到unicode編碼相關的是只能用urllib來處理的,這也是二者的差別之一。

在urllib2中openurl函數還有幾個常用的參數:data、timeout,阻塞操作以秒為單位,data和request對象在Request類中說明。

Requset類有5個參數:url,data,headers,origin_req_host,unverifiable 。

url不必說了就是我們要請求的url地址
data是我們要向伺服器提交的額外的資料,如果沒有資料可以為None,請求如果是由資料的話那就是POST請求,這些資料需要以標準的格式編碼然後傳送給request對象。
headers要求標頭,是一個字典類型的。它是告訴伺服器請求的一些資訊,例如像請求的瀏覽器資訊,作業系統資訊,cookie,返回資訊格式,緩衝,是否支援壓縮等等,像一些反爬蟲的網站會監測請求的類型,我們需要偽裝成瀏覽器而不是直接發起請求,例如上面代碼裡的User-Agent
origin_req_host是RFC2965定義的源互動的request-host。預設的取值是cookielib.request_host(self)。這是由使用者發起的原始請求的主機名稱或IP地址。例如，如果請求的是一個HTML文檔中的映像，這應該是包含該映像的頁面請求的request-host。
unverifiable代表請求是否是無法驗證的，它也是由RFC2965定義的。預設值為false。一個無法驗證的請求是，其使用者的URL沒有足夠的許可權來被接受。

我們在請求的時候不一定每次都是請求的成功的頁面,如果請求url不正常報錯也是需要做好判斷處理的。

try:    response = urllib2.urlopen(‘http://www.baidu.com‘)except urllib2.HTTPError as e:    print e.code    print e.reasonexcept urllib2.URLError as e:    print e.reasonelse:    response.read()

當發生錯誤拋出異常我們可以捕獲查看異常原因,擷取請求的狀態代碼。getcode()方法也可以擷取請求狀態代碼,附錄:

# Table mapping response codes to messages; entries have the# form {code: (shortmessage, longmessage)}.responses = {    100: (‘Continue‘, ‘Request received, please continue‘),    101: (‘Switching Protocols‘,          ‘Switching to new protocol; obey Upgrade header‘),    200: (‘OK‘, ‘Request fulfilled, document follows‘),    201: (‘Created‘, ‘Document created, URL follows‘),    202: (‘Accepted‘,          ‘Request accepted, processing continues off-line‘),    203: (‘Non-Authoritative Information‘, ‘Request fulfilled from cache‘),    204: (‘No Content‘, ‘Request fulfilled, nothing follows‘),    205: (‘Reset Content‘, ‘Clear input form for further input.‘),    206: (‘Partial Content‘, ‘Partial content follows.‘),    300: (‘Multiple Choices‘,          ‘Object has several resources -- see URI list‘),    301: (‘Moved Permanently‘, ‘Object moved permanently -- see URI list‘),    302: (‘Found‘, ‘Object moved temporarily -- see URI list‘),    303: (‘See Other‘, ‘Object moved -- see Method and URL list‘),    304: (‘Not Modified‘,          ‘Document has not changed since given time‘),    305: (‘Use Proxy‘,          ‘You must use proxy specified in Location to access this ‘          ‘resource.‘),    307: (‘Temporary Redirect‘,          ‘Object moved temporarily -- see URI list‘),    400: (‘Bad Request‘,          ‘Bad request syntax or unsupported method‘),    401: (‘Unauthorized‘,          ‘No permission -- see authorization schemes‘),    402: (‘Payment Required‘,          ‘No payment -- see charging schemes‘),    403: (‘Forbidden‘,          ‘Request forbidden -- authorization will not help‘),    404: (‘Not Found‘, ‘Nothing matches the given URI‘),    405: (‘Method Not Allowed‘,          ‘Specified method is invalid for this server.‘),    406: (‘Not Acceptable‘, ‘URI not available in preferred format.‘),    407: (‘Proxy Authentication Required‘, ‘You must authenticate with ‘          ‘this proxy before proceeding.‘),    408: (‘Request Timeout‘, ‘Request timed out; try again later.‘),    409: (‘Conflict‘, ‘Request conflict.‘),    410: (‘Gone‘,          ‘URI no longer exists and has been permanently removed.‘),    411: (‘Length Required‘, ‘Client must specify Content-Length.‘),    412: (‘Precondition Failed‘, ‘Precondition in headers is false.‘),    413: (‘Request Entity Too Large‘, ‘Entity is too large.‘),    414: (‘Request-URI Too Long‘, ‘URI is too long.‘),    415: (‘Unsupported Media Type‘, ‘Entity body in unsupported format.‘),    416: (‘Requested Range Not Satisfiable‘,          ‘Cannot satisfy request range.‘),    417: (‘Expectation Failed‘,          ‘Expect condition could not be satisfied.‘),    500: (‘Internal Server Error‘, ‘Server got itself in trouble‘),    501: (‘Not Implemented‘,          ‘Server does not support this operation‘),    502: (‘Bad Gateway‘, ‘Invalid responses from another server/proxy.‘),    503: (‘Service Unavailable‘,          ‘The server cannot process the request due to a high load‘),    504: (‘Gateway Timeout‘,          ‘The gateway server did not receive a timely response‘),    505: (‘HTTP Version Not Supported‘, ‘Cannot fulfill request.‘),    }

Requests

requests使用的是urllib3,繼承了urllib2的所有特性,Requests支援HTTP串連保持和串連池，支援使用cookie保持會話，支援檔案上傳，支援自動確定響應內容的編碼，支援國際化的 URL 和 POST 資料自動編碼。

get請求

response = requests.get(‘http://www.baidu.com‘)print response.text

post請求

response = requests.post(‘http://api.baidu.com‘, data={    ‘data‘: ‘value‘})

定製headers

response = requests.get(‘http://www.baidu.com‘, headers={    ‘User-Agent‘: user_agent})

response返回資料的相關操作:

r.status_code #響應狀態代碼
r.raw #返回原始響應體，也就是 urllib 的 response 對象，使用 r.raw.read() 讀取
r.content #位元組方式的響應體，會自動為你解碼 gzip 和 deflate 壓縮
r.text #字串方式的響應體，會自動根據回應標頭部的字元編碼進行解碼
r.headers #以字典Object Storage Service伺服器回應標頭，但是這個字典比較特殊，字典鍵不區分大小寫，若鍵不存在則返回None
特殊方法:
r.json() #Requests中內建的JSON解碼器
r.raise_for_status() #失敗請求(非200響應)拋出異常

----------------未完待續---------------

人生苦短之Python的urllib urllib2 requests

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More