This section focuses on network-related knowledge in the Python language.
One
The main files and directories are below the request.py module in Urllib. which supports SSL encryption access.
Let's take a look at the main classes and functions.
Let's look at the source code first.
def urlopen (URL, Data=none, timeout=socket._global_default_timeout, *, Cafile=none, Capath=none, Cadefault=False ): Global _opener if Cafile or Capath or Cadefault: if not _have_ssl: raise ValueError (' SSL + not AV Ailable ') context = Ssl._create_stdlib_context (Cert_reqs=ssl. cert_required, cafile=cafile, capath=capath) Https_handler = Httpshandler (Context=context, Check_ Hostname=true) opener = Build_opener (https_handler) elif _opener is None: _opener = opener = Build_opener () Else: opener = _opener return opener.open (URL, data, timeout)
Direct use of the Urlopen function for Web Access, the main key to pass is the URL of specific URLs
Import urllib.requestif __name__ = = ' __main__ ': print (' main Thread Run: ', __name__) responsedata = Urllib.request.urlopen (' http://www.baidu.com/robots.txt ') strdata = Responsedata.read () strshow = Strdata.decode (' utf-8 ') if (false): print (Responsedata.geturl ()) if (false): print ( Responsedata.info ()) else: print (responsedata.__sizeof__ ()) print (strshow) Responsedata.close () print (' \nmain Thread Exit: ', __name__)
//Note the above code, using the UTF-8 encoding method, so the corresponding decoding is the same.
The results are as follows
Main Thread Run: __main__32user-agent:baiduspiderdisallow:/baidudisallow:/s? Disallow:/ulink? Disallow:/link? User-agent:googlebotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:msnbotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:baiduspider-imagedisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:youdaobotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Web Spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Inst Spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Spider2disallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou blogdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou News spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Orion spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:chinasospiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sosospiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:yisouspiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:easouspiderdisallow:/baidudisallow:/s? Disallow:/shifen/Disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent: *disallow:/main Thread Exit: __main__
Two
The function Urlretrieve can implement a direct delivery URL address to read the Web page content and store it as a local file.
The function return value is a list which includes two parameters, the first is the local storage file name, and the second is the Web service
The HTTP response header returned
def urlretrieve (URL, Filename=none, Reporthook=none, Data=none): "" " Retrieve a URL to a temporary location on D Isk.
Code testing
Import urllib.requestif __name__ = = ' __main__ ': print (' main Thread Run: ', __name__) data = Urllib.request.urlretrieve (' Http://www.baidu.com/robots.txt ', ' robots.txt ') print ('--filename--: ', data[0]) print ('--response--: ', data[1]) print (' \nmain Thread Exit: ', __name__)
Results:
Main Thread Run: __main__--filename--: robots.txt--response--: Date:mon, Sep 08:08:05 gmtserver:apachep3p:cp= " OTI DSP COR IVA our IND COM "set-cookie:baiduid=4fb847bee916a0f72abc5093271cd2bc:fg=1; Expires=tue, 22-sep-15 08:08:05 GMT; max-age=31536000; path=/; domain=.baidu.com; Version=1last-modified:thu, 07:10:38 gmtetag: "91e-4fe5e56791780" Accept-ranges:bytescontent-length: 2334vary:accept-encoding,user-agentconnection:closecontent-type:text/plainmain Thread Exit: __main__
Three
The function request_host resolves the host address contained in the URL only one Request object instance is passed in the parameter
The request object will be introduced later.
Here's a look at the function source code
def request_host (Request): "" " Return Request-host, as defined by RFC 2965. Variation from rfc:returned value was lowercased, for convenient comparison. "" " url = request.full_url host = urlparse (URL) [1] if host = = "": host = Request.get_header ("host", "") # Remove port, if present host = _cut_port_re.sub ("", host, 1) return Host.lower ()
Test Code:
Import urllib.requestif __name__ = = ' __main__ ': print (' main Thread Run: ', __name__) Req = Urllib.request.Request (' Http://www.baidu.com/robots.txt ') host = Urllib.request.request_host (REQ) print ( Host) print (' \nmain Thread Exit: ', __name__)
Results:
Main thread Run: __main__www.baidu.commain thread Exit: __main__
Four
The module main class request class is described below. Look, it's a capital R.
First look at the source code
Class Request:def __init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false, Method=none): Self.full_url = URL self.headers = {} Self.unredirected_hdrs = {} Self._data = None Self.data = Data Self._tunnel_host = None for key, value in Headers.items (): Self.add_header (key, value) if origin_req_host is None:origin_req_host = Request_host (self) Self.origin_req_host = Origin_req_host self.unverifiable = unverifiable if Method:self.metho D = Method @property def full_url (self): if Self.fragment:return ' {}#{} '. Format (Self._full_url, S elf.fragment) return Self._full_url @full_url. Setter def full_url (self, URL): # Unwrap (' <url:type:/ /host/path> ')--' type://host/path ' Self._full_url = Unwrap (URL) self._full_url, self.fragment = spli Ttag (Self._full_url) Self._parse () @full_url. Deleter def full_url (self): Self._full_url = None Self.fragment = None Self.selector = "@property def data: Return Self._data @data. Setter def data (self, data): if data! = Self._data:self._data = Data # issue 16464 # If we change data we need To remove Content-length header # (cause it's most probably calculated for previous value) if self.h As_header ("Content-length"): Self.remove_header ("Content-length") @data. Deleter def data (self): Self.data = None def _parse (self): self.type, rest = SplitType (Self._full_url) If Self.type is none: Raise ValueError ("Unknown URL type:%r"% self.full_url) self.host, Self.selector = Splithost (rest) If Self.host:self.host = Unquote (self.host) def get_method (self): "" "Return a String indicating the HTTP request method. "" Default_method = "POST" If Self.data is not None of else "GET" return getattr (Self, ' method ', Default_method) def get_full_url (self): return Self.full_url def set_proxy (self, host, type): if Self.type = = ' https ' And not self._tunnel_host:self._tunnel_host = self.host else:self.type= Type sel F.selector = Self.full_url Self.host = host def has_proxy (self): return self.selector = = Self.full_url def add_header (self, Key, Val): # Useful for something like authentication self.headers[key.capitalize ()] = Val def add_unredirected_header (self, Key, Val): # won't is added to a redirected request Self.unredi Rected_hdrs[key.capitalize ()] = Val def has_header (self, Header_name): Return (Header_name in Self.headers or Header_name in Self.unredirected_hdrs) def get_header (self, Header_name, Default=none): Return to self . Headers.get ( Header_name, Self.unredirected_hdrs.get (header_name, default)) def remove_header (self, header_name): Self.headers.pop (Header_name, none) Self.unredirected_hdrs.pop (Header_name, none) def header_items (self): HDRs = Self.unredirected_hdrs.copy () hdrs.update (self.headers) return list (Hdrs.items ())
the initial constructor function of the class
def __init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false, M Ethod=none):
Note that several key parameter URLs represent the URL address you want to access, and data represents the post you want to send.
Headers represents the header information field that you need to include in the HTTP request header
Method represents the use of Get or post methods.
The default is post delivery mode
<span style= "FONT-SIZE:12PX;" >req = urllib.request.Request (' http://www.baidu.com/robots.txt ') </span>
to create an object instance of the request class
For example, add a user-agent header to the field headers
User_agent = {' user-agent ': ' mozilla/5.0 (Windows; U Windows NT 5.1; En-us; rv:1.9.1.6) gecko/20091201 firefox/3.5.6 '}req= urllib.request.Request (url= ' Http://www.baidu.com/robots.txt ', Headers=user_agent)
Modify the time-out period
Import Socket Socket.setdefaulttimeout (TEN) #10s
Five
The following describes the use of proxies
Proxy configuration and related address information must be made before invoking the Web Access service.
The following code examples are used:
Import Socket Import urllib.requestsocket.setdefaulttimeout # 10sif __name__ = = ' __main__ ': print (' main Thread Run: ', __name__) proxy = Urllib.request.ProxyHandler ({' http ': ' http://www.baidu.com:8080 '}) opener = Urllib.request.build_opener (proxy, Urllib.request.HTTPHandler) Urllib.request.install_opener (opener) Content = Urllib.request.urlopen (' Http://www.baidu.com/robots.txt '). Read () print (' \nmain Thread Exit: ', __name_ _)
VI: Error exception handling
Python's Network service exception handles correlation functions and uses.
Mainly the use of try and EXECPT statement blocks. Remember the important point
Python's exception-handling statement, preferably a line of code that throws a catch
Example:
Try: Requrl = urllib.request.Request (url= ' http://www.baidu.com/robots.txt ', headers=user_agent) except Httperror: print (' Urllib.error.HTTPError ') except Urlerror: print (' Urllib.error.URLError ') Except OSError: print (' urllib.error.OSError ') try: responsedata = Urllib.request.urlopen (Requrl) except Httperror: print (' Urllib.error.HTTPError ') except Urlerror: responsedata.close () print (' Urllib.error.URLError ') except OSError: print (' urllib.error.OSError ') try: Pagedata = Responsedata.read () except Httperror: responsedata.close () print (' Urllib.error.HTTPError ') except Urlerror: responsedata.close () print (' Urllib.error.URLError ') except OSError: print (' Urllib.error.OSError ') print (pagedata) responsedata.close ()
Seven notes
These are probably some of the basic Web Access service functions and classes used, and there are many ways and functions that can do the same.
Calls are made according to individual wishes and needs. There is a remember I just basic learning, sorting notes in this, convenient rookie and his later review.
Python Personal Learning Note four