Environment Use python3.5.2 urllib3-1.22
Download installation
wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz
TAR-ZXF python-3.5.2.tgz
CD PYTHON-3.5.2/
./configure--prefix=/usr/local/python
Make && make install
mv/usr/bin/python/usr/bin/python275
Ln-s/usr/local/python/bin/python3/usr/bin/python
wget https://files.pythonhosted.org/packages/ee/11/7c59620aceedcc1ef65e156cc5ce5a24ef87be4107c2b74458464e437a5d /urllib3-1.22.tar.gz
Tar zxf urllib3-1.22.tar.gz
CD urllib3-1.22/
Python setup.py Install
Browser Emulation Example
Add headers: Build_opener () Import urllib.requesturl= "http://www.baidu.com" headers= ("User-agent", "mozilla/5.0" ( Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36 ") Opener=urllib.request.build_opener () Opener.addheaders=[headers]data=opener.open (URL). Read () Fl=open ("/home/urllib/test/1.html", "WB") fl.write (data) Fl.close ()
Add Headers II: Add_header () Import urllib.requesturl= "http://www.baidu.com" req=urllib.request.request (URL) req.add_ Header ("User-agent", "mozilla/5.0" (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36 ") Data=urllib.request.urlopen (req). Read () Fl=open ("/home/urllib/test/2.html", "WB") fl.write (data) fl.close ()
Increase the timeout setting
Timeout Timeout Import urllib.requestfor i in range (1,100): Try:file=urllib.request.urlopen ("http://www.baidu.com", timeout =1) Data=file.read () print (Len data) except Exception as E:print ("exception---->" +STR (E))
HTTP protocol GET request one
Get request import urllib.requestkeywd= "Hello" url= "http://www.baidu.com/s?wd=" +keywdreq=urllib.request.request (URL) Req.add_header ("User-agent", "mozilla/5.0" (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36 ") Data=-urllib.request.urlopen (req). Read () Fl=open ("/home/urllib/test/3.html", "WB") fl.write (data) fl.close ()
HTTP protocol GET request two
Get Request (encoding) import urllib.requestkeywd= "China" url= "http://www.baidu.com/s?wd=" Key_code=urllib.request.quote (keywd) URL _all=url+key_codereq=urllib.request.request (Url_all) req.add_header ("User-agent", "mozilla/5.0" (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36 ") Data=-urllib.request.urlopen (req). Read () Fl=open ("/home/urllib/test/4.html", "WB") fl.write (data) fl.close ()
HTTP protocol POST request
Post request import Urllib.requestimport urllib.parseurl= "http://www.baidu.com/mypost/" Postdata=urllib.parse.urlencode ( {"User": "testname", "passwd": "123456"}). Encode (' Utf-8 ') req=urllib.request.request (url,postdata) red.add_header ("User-agent", "mozilla/5.0" (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36 ") Data=urllib.request.urlopen (req). Read () Fl=open ("/home/urllib/test/5.html", "WB") fl.write (data) fl.close ()
Using a proxy server
def use_proxy (proxy_addr,url): Import urllib.requestproxy=urllib.request.proxyhandler ({' http ':p roxy_addr}) opener= Urllib.request.build_opener (Proxy,urllib.request.httphandler) Urllib.request.install_opener (opener) Data= Urllib.request.urlopen (URL). Read () decode (' Utf-8 ') return dataproxy_addr= "201.25.210.23:7623" url= "http:/ Www.baidu.com "Data=use_proxy (proxy_addr,url) fl=open ("/home/urllib/test/6.html "," WB ") fl.write (data) fl.close ()
Open Debuglog
Import urllib.requesturl= "http://www.baidu.com" Httpd=urllib.request.httphandler (debuglevel=1) httpsd= Urllib.request.HTTPSHandler (debuglevel=1) Opener=urllib.request.build_opener (opener) Urllib.request.install_ Opener (opener) Data=urllib.request.urlopen (URL) fl=open ("/home/urllib/test/7.html", "WB") fl.write (data) fl.close ()
Urlerror Exception Handling
Urlerror Exception Handling Import urllib.requestimport Urllib.errortry:urllib.request.urlopen ("Http://blog.csdn.net") except Urllib.error.URLError as E:print (E.reason) httperror handles import Urllib.requestimport urllib.errortry: Urllib.request.urlopen ("Http://blog.csdn.net") except Urllib.error.HTTPError as E:print (e.code) print (E.reason) Use the import urllib.requestimport urllib.errortry:urllib.request.urlopen ("Http://blog.csdn.net") together except Urllib.error.HTTPError as E:print (e.code) print (E.reason) except Urllib.error.URLError as E:print (E.reason) Recommended method: Import Urllib.requestimport Urllib.errortry:urllib.request.urlopen ("Http://blog.csdn.net") except Urllib.error.URLError as E:if hasattr (E, "code"):p rint (E.code) if Hasattr (E, "reason"):p rint (E.reason)
Examples are for reference only
Python crawler urllib Basic example