Python Personal Learning Note four

Source: Internet
Author: User
Tags rfc

This section focuses on network-related knowledge in the Python language.


One

The main files and directories are below the request.py module in Urllib. which supports SSL encryption access.

Let's take a look at the main classes and functions.

Let's look at the source code first.

def urlopen (URL, Data=none, timeout=socket._global_default_timeout,            *, Cafile=none, Capath=none, Cadefault=False ):    Global _opener    if Cafile or Capath or Cadefault:        if not _have_ssl:            raise ValueError (' SSL + not AV Ailable ')        context = Ssl._create_stdlib_context (Cert_reqs=ssl. cert_required,                                             cafile=cafile,                                             capath=capath)        Https_handler = Httpshandler (Context=context, Check_ Hostname=true)        opener = Build_opener (https_handler)    elif _opener is None:        _opener = opener = Build_opener ()    Else:        opener = _opener    return opener.open (URL, data, timeout)


Direct use of the Urlopen function for Web Access, the main key to pass is the URL of specific URLs

Import urllib.requestif __name__ = = ' __main__ ':    print (' main Thread Run: ', __name__)    responsedata = Urllib.request.urlopen (' http://www.baidu.com/robots.txt ')    strdata = Responsedata.read ()    strshow = Strdata.decode (' utf-8 ')    if (false):        print (Responsedata.geturl ())    if (false):        print ( Responsedata.info ())    else:        print (responsedata.__sizeof__ ())        print (strshow)    Responsedata.close ()    print (' \nmain Thread Exit: ', __name__)

//Note the above code, using the UTF-8 encoding method, so the corresponding decoding is the same.

The results are as follows

Main Thread Run: __main__32user-agent:baiduspiderdisallow:/baidudisallow:/s? Disallow:/ulink? Disallow:/link? User-agent:googlebotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:msnbotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:baiduspider-imagedisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:youdaobotdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Web Spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Inst Spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Spider2disallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou blogdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou News spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sogou Orion spiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:chinasospiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:sosospiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:yisouspiderdisallow:/baidudisallow:/s? Disallow:/shifen/disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent:easouspiderdisallow:/baidudisallow:/s? Disallow:/shifen/Disallow:/homepage/disallow:/cprodisallow:/ulink? Disallow:/link? User-agent: *disallow:/main Thread Exit: __main__

Two

The function Urlretrieve can implement a direct delivery URL address to read the Web page content and store it as a local file.

The function return value is a list which includes two parameters, the first is the local storage file name, and the second is the Web service

The HTTP response header returned

def urlretrieve (URL, Filename=none, Reporthook=none, Data=none): "" "    Retrieve a URL to a temporary location on D Isk.

Code testing


Import urllib.requestif __name__ = = ' __main__ ':    print (' main Thread Run: ', __name__)    data = Urllib.request.urlretrieve (' Http://www.baidu.com/robots.txt ', ' robots.txt ')    print ('--filename--: ', data[0])    print ('--response--: ', data[1])    print (' \nmain Thread Exit: ', __name__)

Results:


Main Thread Run: __main__--filename--: robots.txt--response--: Date:mon, Sep 08:08:05 gmtserver:apachep3p:cp= " OTI DSP COR IVA our IND COM "set-cookie:baiduid=4fb847bee916a0f72abc5093271cd2bc:fg=1; Expires=tue, 22-sep-15 08:08:05 GMT; max-age=31536000; path=/; domain=.baidu.com; Version=1last-modified:thu, 07:10:38 gmtetag: "91e-4fe5e56791780" Accept-ranges:bytescontent-length: 2334vary:accept-encoding,user-agentconnection:closecontent-type:text/plainmain Thread Exit: __main__


Three

The function request_host resolves the host address contained in the URL only one Request object instance is passed in the parameter

The request object will be introduced later.

Here's a look at the function source code

def request_host (Request): "" "    Return Request-host, as defined by RFC 2965.    Variation from rfc:returned value was lowercased, for convenient    comparison.    "" " url = request.full_url    host = urlparse (URL) [1]    if host = = "":        host = Request.get_header ("host", "")    # Remove port, if present    host = _cut_port_re.sub ("", host, 1)    return Host.lower ()

Test Code

Import urllib.requestif __name__ = = ' __main__ ':    print (' main Thread Run: ', __name__)    Req = Urllib.request.Request (' Http://www.baidu.com/robots.txt ')    host = Urllib.request.request_host (REQ)    print ( Host)    print (' \nmain Thread Exit: ', __name__)

Results:


Main thread Run: __main__www.baidu.commain thread Exit: __main__

Four

The module main class request class is described below. Look, it's a capital R.

First look at the source code

Class Request:def __init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false,        Method=none): Self.full_url = URL self.headers = {} Self.unredirected_hdrs = {}            Self._data = None Self.data = Data Self._tunnel_host = None for key, value in Headers.items ():        Self.add_header (key, value) if origin_req_host is None:origin_req_host = Request_host (self) Self.origin_req_host = Origin_req_host self.unverifiable = unverifiable if Method:self.metho D = Method @property def full_url (self): if Self.fragment:return ' {}#{} '. Format (Self._full_url, S elf.fragment) return Self._full_url @full_url. Setter def full_url (self, URL): # Unwrap (' <url:type:/ /host/path> ')--' type://host/path ' Self._full_url = Unwrap (URL) self._full_url, self.fragment = spli    Ttag (Self._full_url)    Self._parse () @full_url. Deleter def full_url (self): Self._full_url = None Self.fragment = None        Self.selector = "@property def data: Return Self._data @data. Setter def data (self, data): if data! = Self._data:self._data = Data # issue 16464 # If we change data we need To remove Content-length header # (cause it's most probably calculated for previous value) if self.h        As_header ("Content-length"): Self.remove_header ("Content-length") @data. Deleter def data (self):            Self.data = None def _parse (self): self.type, rest = SplitType (Self._full_url) If Self.type is none:        Raise ValueError ("Unknown URL type:%r"% self.full_url) self.host, Self.selector = Splithost (rest)  If Self.host:self.host = Unquote (self.host) def get_method (self): "" "Return a String indicating the HTTP request method. ""        Default_method = "POST" If Self.data is not None of else "GET" return getattr (Self, ' method ', Default_method) def get_full_url (self): return Self.full_url def set_proxy (self, host, type): if Self.type = = ' https ' And not self._tunnel_host:self._tunnel_host = self.host else:self.type= Type sel    F.selector = Self.full_url Self.host = host def has_proxy (self): return self.selector = = Self.full_url  def add_header (self, Key, Val): # Useful for something like authentication self.headers[key.capitalize ()] = Val def add_unredirected_header (self, Key, Val): # won't is added to a redirected request Self.unredi                Rected_hdrs[key.capitalize ()] = Val def has_header (self, Header_name): Return (Header_name in Self.headers or Header_name in Self.unredirected_hdrs) def get_header (self, Header_name, Default=none): Return to self           . Headers.get ( Header_name, Self.unredirected_hdrs.get (header_name, default)) def remove_header (self, header_name):        Self.headers.pop (Header_name, none) Self.unredirected_hdrs.pop (Header_name, none) def header_items (self): HDRs = Self.unredirected_hdrs.copy () hdrs.update (self.headers) return list (Hdrs.items ())

the initial constructor function of the class


def __init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false, M Ethod=none):
Note that several key parameter URLs represent the URL address you want to access, and data represents the post you want to send.

Headers represents the header information field that you need to include in the HTTP request header

Method represents the use of Get or post methods.

The default is post delivery mode


<span style= "FONT-SIZE:12PX;" >req = urllib.request.Request (' http://www.baidu.com/robots.txt ') </span>

to create an object instance of the request class

For example, add a user-agent header to the field headers


User_agent = {' user-agent ': ' mozilla/5.0 (Windows; U Windows NT 5.1; En-us; rv:1.9.1.6) gecko/20091201 firefox/3.5.6 '}req= urllib.request.Request (url= ' Http://www.baidu.com/robots.txt ', Headers=user_agent)

Modify the time-out period

Import Socket Socket.setdefaulttimeout (TEN) #10s

Five

The following describes the use of proxies

Proxy configuration and related address information must be made before invoking the Web Access service.

The following code examples are used:

Import Socket Import urllib.requestsocket.setdefaulttimeout  # 10sif __name__ = = ' __main__ ':    print (' main Thread Run: ', __name__)    proxy = Urllib.request.ProxyHandler ({' http ': ' http://www.baidu.com:8080 '})    opener = Urllib.request.build_opener (proxy, Urllib.request.HTTPHandler)    Urllib.request.install_opener (opener)    Content = Urllib.request.urlopen (' Http://www.baidu.com/robots.txt '). Read ()    print (' \nmain Thread Exit: ', __name_ _)

VI: Error exception handling

Python's Network service exception handles correlation functions and uses.

Mainly the use of try and EXECPT statement blocks. Remember the important point

Python's exception-handling statement, preferably a line of code that throws a catch


Example:


    Try:        Requrl = urllib.request.Request (url= ' http://www.baidu.com/robots.txt ', headers=user_agent)    except Httperror:        print (' Urllib.error.HTTPError ')    except Urlerror:        print (' Urllib.error.URLError ')    Except OSError:        print (' urllib.error.OSError ')    try:        responsedata = Urllib.request.urlopen (Requrl)    except Httperror:        print (' Urllib.error.HTTPError ')    except Urlerror:        responsedata.close ()        print (' Urllib.error.URLError ')    except OSError:        print (' urllib.error.OSError ')    try:        Pagedata = Responsedata.read ()    except Httperror:        responsedata.close ()        print (' Urllib.error.HTTPError ')    except Urlerror:        responsedata.close ()        print (' Urllib.error.URLError ')    except OSError:        print (' Urllib.error.OSError ')    print (pagedata)    responsedata.close ()



Seven notes


These are probably some of the basic Web Access service functions and classes used, and there are many ways and functions that can do the same.

Calls are made according to individual wishes and needs. There is a remember I just basic learning, sorting notes in this, convenient rookie and his later review.

Python Personal Learning Note four

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.