Python學習筆記22(urllib模組)

來源:互聯網
上載者:User

標籤:lwp   com   gen   context   baidu   檔案中   urlencode   expires   pre   

Python3和Python2的urllib模組不太一樣,本篇文章是以Python3為前提。

1.urlopen的使用
import urllib.requesturllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)#url:需要抓取的網頁#data:Post提交的資料。預設為空白,使用的是get請求,若data有資料則是Post請求#timeout:設定網站的訪問逾時時間
import urllib.requestresponse = urllib.request.urlopen(‘http://www.baidu.com‘)print(response.read().decode(‘utf-8‘))#response.read() 擷取的資料格式為bytes類型#需要decode(),轉換成str類型
#POST請求import urllib.parseimport urllib.requestdata = bytes(urllib.parse.urlencode({‘word‘: ‘hello‘}), encoding=‘utf8‘)response = urllib.request.urlopen(‘http://httpbin.org/post‘, data=data)print(response.read())
#逾時設定import urllib.requestresponse = urllib.request.urlopen(‘http://httpbin.org/get‘, timeout=0.1)print(response.read())
2.Request的使用
#get請求import urllib.requestrequest = urllib.request.Request(‘https://python.org‘)response = urllib.request.urlopen(request)print(response.read().decode(‘utf-8‘))#post請求from urllib import request, parseurl = ‘http://httpbin.org/post‘headers = {    ‘User-Agent‘: ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘,    ‘Host‘: ‘httpbin.org‘}dict = {    ‘name‘: ‘Germey‘}data = bytes(parse.urlencode(dict), encoding=‘utf8‘)req = request.Request(url=url, data=data, headers=headers, method=‘POST‘)response = request.urlopen(req)print(response.read().decode(‘utf-8‘))
3.代理
import urllib.requestproxy_handler = urllib.request.ProxyHandler({    ‘http‘: ‘http://127.0.0.1:9743‘,    ‘https‘: ‘https://127.0.0.1:9743‘})opener = urllib.request.build_opener(proxy_handler)response = opener.open(‘http://httpbin.org/get‘)print(response.read().decode(‘utf-8‘))
4.Cookie
#擷取cookieimport http.cookiejar, urllib.requestcookie = http.cookiejar.CookieJar()handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open(‘http://www.baidu.com‘)for item in cookie:    print(item.name+"="+item.value)#擷取cookie並且儲存在檔案中#有兩種格式,記得哪種格式存的哪種格式讀就好#格式一import http.cookiejar, urllib.requestfilename = "cookie.txt"cookie = http.cookiejar.MozillaCookieJar(filename)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open(‘http://www.baidu.com‘)cookie.save(ignore_discard=True, ignore_expires=True)#格式二import http.cookiejar, urllib.requestfilename = ‘cookie.txt‘cookie = http.cookiejar.LWPCookieJar(filename)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open(‘http://www.baidu.com‘)cookie.save(ignore_discard=True, ignore_expires=True)#以格式二讀cookie,並且訪問urlimport http.cookiejar, urllib.requestcookie = http.cookiejar.LWPCookieJar()cookie.load(‘cookie.txt‘, ignore_discard=True, ignore_expires=True)handler = urllib.request.HTTPCookieProcessor(cookie)opener = urllib.request.build_opener(handler)response = opener.open(‘http://www.baidu.com‘)print(response.read().decode(‘utf-8‘))
5.異常處理
#urllib.error有兩個錯誤類URLError和HTTPError,HTTPError是URLError的子類,所以一般先捕捉小的錯誤類,再捕捉大的錯誤類from urllib import request, errortry:    response = request.urlopen(‘http://cuiqingcai.com/index.htm‘)except error.HTTPError as e:    print(e.reason, e.code, e.headers, sep=‘\n‘)except error.URLError as e:    print(e.reason)else:    print(‘Request Successfully‘)

 

Python學習筆記22(urllib模組)

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.