Urllib is a Python get URL (Uniform Resource locators, unified resource addressable device), we can use it to crawl remote data to save Oh, here are some about the urllib use some about the header, agent, Timeout, authentication, exception handling methods, let's take a look below.
N methods of crawling Web resources Python3
1, the simplest
Import= Urllib.request.urlopen ('http://python.org/'= Response.read ()
2. Use Request
Import= urllib.request.Request ('http://python.org/'= = Response.read ()
3. Send data
#!/usr/bin/env Python3ImportUrllib.parseImportUrllib.requesturl='http://localhost/login.php'user_agent='mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'Values= {'Act':'Login','Login[email]':'[email protected]','Login[password]':'123456'}data=Urllib.parse.urlencode (values) Req=urllib.request.Request (URL, data) Req.add_header ('Referer','http://www.python.org/') Response=Urllib.request.urlopen (req) the_page=Response.read ()Print(The_page.decode ("UTF8"))
4. Send data and headers
#!/usr/bin/env Python3ImportUrllib.parseImportUrllib.requesturl='http://localhost/login.php'user_agent='mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'Values= {'Act':'Login','Login[email]':'[email protected]','Login[password]':'123456'}headers= {'user-agent': User_agent}data=Urllib.parse.urlencode (values) Req=urllib.request.Request (URL, data, headers) Response=Urllib.request.urlopen (req) the_page=Response.read ()Print(The_page.decode ("UTF8"))
5. HTTP Error
# !/usr/bin/env Python3 Import = urllib.request.Request (")try: Urllib.request.urlopen (req)except urllib.error.HTTPError as e:print( E.code)print(E.read (). Decode ("UTF8"))
6. Exception Handling 1
#!/usr/bin/env Python3 fromUrllib.requestImportRequest, Urlopen fromUrllib.errorImportUrlerror, Httperrorreq= Request ("http://www.111cn.net/")Try: Response=Urlopen (req)exceptHttperror as E:Print('The server couldn'T fulfill the request.')Print('Error Code:', E.code)exceptUrlerror as E:Print('We failed to reach a server.')Print('Reason:', E.reason)Else:Print("good!")Print(Response.read (). Decode ("UTF8"))
7. Exception Handling 2
#!/usr/bin/env Python3 fromUrllib.requestImportRequest, Urlopen fromUrllib.errorImportUrlerrorreq= Request ("http://www.111cn.net/")Try: Response=Urlopen (req)exceptUrlerror as E:ifHasattr (E,'reason'):Print('We failed to reach a server.')Print('Reason:', E.reason)elifHasattr (E,'Code'):Print('The server couldn'T fulfill the request.')Print('Error Code:', E.code)Else:Print("good!")Print(Response.read (). Decode ("UTF8"))
8. HTTP Authentication
#!/usr/bin/env Python3Importurllib.request#Create a password managerPassword_mgr =Urllib.request.HTTPPasswordMgrWithDefaultRealm ()#ADD the username and password.#If We knew the realm, we could use it instead of None.Top_level_url ="https://www.111cn.net/"Password_mgr.add_password (None, Top_level_url,'Rekfan','xxxxxx') Handler=Urllib.request.HTTPBasicAuthHandler (password_mgr)#Create "opener" (Openerdirector instance)Opener =Urllib.request.build_opener (handler)#Use the opener to fetch a URLA_url ="https://www.111cn.net/"x=Opener.open (A_url)Print(X.read ())#Install the opener.#Now all calls to Urllib.request.urlopen with our opener.Urllib.request.install_opener (opener) a= Urllib.request.urlopen (A_url). Read (). Decode ('UTF8')Print(a)
9, the use of agents
#!/usr/bin/env Python3ImportUrllib.requestproxy_support= Urllib.request.ProxyHandler ({'Sock5':'localhost:1080'}) Opener=Urllib.request.build_opener (Proxy_support) Urllib.request.install_opener (opener) a= Urllib.request.urlopen ("http://www.111cn.net"). Read (). Decode ("UTF8")Print(a)
10. Timeout
#!/usr/bin/env Python3ImportSocketImporturllib.request#Timeout in secondsTimeout = 2socket.setdefaulttimeout (Timeout)#This call to Urllib.request.urlopen now uses the default timeout#We have a set in the socket modulereq = Urllib.request.Request ('http://www.111cn.net/') A=Urllib.request.urlopen (req). Read ()Print(a)
Python3 Urllib Detailed Usage method (header, proxy, timeout, authentication, exception handling)