python 爬蟲urllib基礎樣本

來源:互聯網
上載者:User

標籤:urllib   爬蟲基礎   

環境使用python3.5.2  urllib3-1.22  

下載安裝

wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz

tar -zxf Python-3.5.2.tgz

cd Python-3.5.2/

./configure --prefix=/usr/local/python

make && make install

mv /usr/bin/python /usr/bin/python275

ln -s /usr/local/python/bin/python3 /usr/bin/python

wget https://files.pythonhosted.org/packages/ee/11/7c59620aceedcc1ef65e156cc5ce5a24ef87be4107c2b74458464e437a5d/urllib3-1.22.tar.gz

tar zxf urllib3-1.22.tar.gz 

cd urllib3-1.22/

python setup.py install


瀏覽器類比樣本

添加headers一:build_opener()import urllib.requesturl="http://www.baidu.com"headers=("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")opener=urllib.request.build_opener()opener.addheaders=[headers]data=opener.open(url).read()fl=open("/home/urllib/test/1.html","wb")fl.write(data)fl.close()
添加headers二:add_header()import urllib.requesturl="http://www.baidu.com"req=urllib.request.Request(url)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")data=urllib.request.urlopen(req).read()fl=open("/home/urllib/test/2.html","wb")fl.write(data)fl.close()


增加逾時設定

timeout逾時import urllib.requestfor i in range(1,100):try:file=urllib.request.urlopen("http://www.baidu.com",timeout=1)data=file.read()print(len(data))except Exception as e:print("出現異常---->"+str(e))


HTTP協議GET請求一

get請求import urllib.requestkeywd="hello"url="http://www.baidu.com/s?wd="+keywdreq=urllib.request.Request(url)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")data=-urllib.request.urlopen(req).read()fl=open("/home/urllib/test/3.html","wb")fl.write(data)fl.close()

HTTP協議GET請求二

get請求 (編碼)import urllib.requestkeywd="中國"url="http://www.baidu.com/s?wd="key_code=urllib.request.quote(keywd)url_all=url+key_codereq=urllib.request.Request(url_all)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")data=-urllib.request.urlopen(req).read()fl=open("/home/urllib/test/4.html","wb")fl.write(data)fl.close()


HTTP協議POST請求

post請求import urllib.requestimport urllib.parseurl="http://www.baidu.com/mypost/"postdata=urllib.parse.urlencode({"user":"testname","passwd":"123456"}).encode('utf-8')req=urllib.request.Request(url,postdata)red.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36")data=urllib.request.urlopen(req).read()fl=open("/home/urllib/test/5.html","wb")fl.write(data)fl.close()


使用Proxy 伺服器

def use_proxy(proxy_addr,url):import urllib.requestproxy=urllib.request.ProxyHandler({'http':proxy_addr})opener=urllib.request.build_opener(proxy,urllib.request.HTTPHandler)urllib.request.install_opener(opener)data=urllib.request.urlopen(url).read().decode('utf-8')return dataproxy_addr="201.25.210.23:7623"url="http://www.baidu.com"data=use_proxy(proxy_addr,url)fl=open("/home/urllib/test/6.html","wb")fl.write(data)fl.close()


開啟DebugLog

import urllib.requesturl="http://www.baidu.com"httpd=urllib.request.HTTPHandler(debuglevel=1)httpsd=urllib.request.HTTPSHandler(debuglevel=1)opener=urllib.request.build_opener(opener)urllib.request.install_opener(opener)data=urllib.request.urlopen(url)fl=open("/home/urllib/test/7.html","wb")fl.write(data)fl.close()


URLError異常處理

URLError異常處理import urllib.requestimport urllib.errortry:urllib.request.urlopen("http://blog.csdn.net")except urllib.error.URLError as e:print(e.reason)HTTPError處理import urllib.requestimport urllib.errortry:urllib.request.urlopen("http://blog.csdn.net")except urllib.error.HTTPError as e:print(e.code)print(e.reason)結合使用import urllib.requestimport urllib.errortry:urllib.request.urlopen("http://blog.csdn.net")except urllib.error.HTTPError as e:print(e.code)print(e.reason)except urllib.error.URLError as e:print(e.reason)推薦方法:import urllib.requestimport urllib.errortry:urllib.request.urlopen("http://blog.csdn.net")except urllib.error.URLError as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)


樣本僅供參考


python 爬蟲urllib基礎樣本

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.