Python uses URLLIB2 to get network resource instances to explain _python

Source: Internet
Author: User
Tags html form rfc urlencode

This is the ability to acquire URLs using different protocols, and he also provides a more complex interface to deal with general situations, such as basic authentication, cookies, proxies, and others.
They are provided through handlers and openers objects.
URLLIB2 supports obtaining URLs in different formats (the string defined before ":" In the URL, for example, "FTP" is the prefix of "ftp:python.ort/"), and they use their associated network protocol (for example, ftp,http)
To obtain. This tutorial focuses on the broadest range of application--http.
For simple applications, Urlopen is very easy to use. But when you encounter an error or an exception when you open an HTTP URL, you will need some Hypertext Transfer Protocol (HTTP) understanding.
The most authoritative HTTP documents are, of course, RFC 2616 (http://rfc.net/rfc2616.html). This is a technical document, so it's not easy to read. The purpose of this howto tutorial is to show how to use URLLIB2,
and provide enough HTTP details to help you understand. He is not a URLLIB2 document description, but an auxiliary role.
Get URLs

The simplest use of URLLIB2 will be shown below

Copy Code code as follows:

Import Urllib2
Response = Urllib2.urlopen (' http://python.org/')
html = Response.read ()

Many of URLLIB2 's applications are simple (remember that, in addition to "http:", URLs can also be replaced with "ftp:", "File:" and so on). But this article is a more complex application that teaches HTTP.
HTTP is based on the request and response mechanism-the client requests, and the server provides the response. URLLIB2 uses a Request object to map your HTTP request, and in its simplest form you will use the
Address creates a request object that, by calling Urlopen and passing in the Request object, returns a related request response object, which is like a file object, so you can call it in response. Read ().

Copy Code code as follows:

Import Urllib2
req = Urllib2. Request (' http://www.jb51.net ')
Response = Urllib2.urlopen (req)
The_page = Response.read ()

Remember that URLLIB2 uses the same interface to handle all the URL headers. For example, you can create an FTP request as follows.

Copy Code code as follows:

req = Urllib2. Request (' ftp://example.com/')

In the case of an HTTP request, you are allowed to do two additional things. The first is that you can send data forms, and second you can send additional information about the data or send itself ("metadata") to the server, which is sent as an HTTP "headers".
Let's take a look at how these are sent.
Data
Sometimes you want to send some data to a URL (usually a URL with a cgi[generic Gateway Interface] script, or another Web application hook). In HTTP, this is often sent using a familiar post request. This is usually done by your browser when you submit an HTML form.
Not all posts are derived from forms, and you can use post to submit arbitrary data to your own program. For general HTML forms, data needs to be encoded into standard form. The data parameter is then uploaded to the request object. Coding works using Urllib functions rather than
Urllib2.

Copy Code code as follows:

Import Urllib
Import Urllib2

url = ' Http://www.jb51.net '
Values = {' name ': ' Michael Foord ',
' Location ': ' Northampton ',
' Language ': ' Python '}

data = Urllib.urlencode (values)
req = Urllib2. Request (URL, data)
Response = Urllib2.urlopen (req)
The_page = Response.read ()

Remember that sometimes you need other encodings (such as uploading files from HTML--see http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13 HTML specification, Form Detailed description of submission).
If Ugoni does not transmit the data parameter, URLLIB2 the request using the Get method. The difference between get and post requests is that post requests usually have "side effects" that change the system state in some way (for example, by submitting piles of garbage to your door).
Although the HTTP standard makes it clear that posts usually produces side effects, get requests do not have side effects, but nothing can prevent a GET request from having a side effect, and the same POST request may not have a side effect. Data can also be passed on the GET request
The URL itself is encoded above to transmit.
You can see the following examples

Copy Code code as follows:

>>> Import Urllib2
>>> Import Urllib
>>> data = {}
>>> data[' name ' = ' Somebody here '
>>> data[' location '] = ' Northampton '
>>> data[' language ' = ' Python '
>>> url_values = urllib.urlencode (data)
>>> Print Url_values
Name=somebody+here&language=python&location=northampton
>>> url = ' http://www.jb51.net '
>>> full_url = URL + '? ' + url_values
>>> data = Urllib2.open (Full_url)

Headers
We'll discuss the specific HTTP headers here to illustrate how to add headers to your HTTP request.
Some sites do not like to be accessed by programs (not for human access), or to send different versions of content to different browsers. The default URLLIB2 takes itself as "python-urllib/x.y" (x and Y are Python major and minor versions, such as python-urllib/2.5),
This identity may confuse the site or simply don't work. The browser confirms that its identity is through the user-agent header, and when you create a request object, you can give him a dictionary containing the header data. The following example sends the same content as above, but puts itself
Simulate Internet Explorer.

Copy Code code as follows:

Import Urllib
Import Urllib2

url = ' Http://www.jb51.net '
User_agent = ' mozilla/4.0 (compatible; MSIE 5.5; Windows NT) '
Values = {' name ': ' Michael Foord ',
' Location ': ' Northampton ',
' Language ': ' Python '}
headers = {' User-agent ': user_agent}

data = Urllib.urlencode (values)
req = Urllib2. Request (URL, data, headers)
Response = Urllib2.urlopen (req)
The_page = Response.read ()

There are also two very useful ways to response an answer object. Looking at the section info and Geturl below, we will see what happens when an error occurs.
Handle Exceptions Handling Exceptions
When Urlopen is unable to handle a response, a urlerror is generated (although common Python APIs such as Valueerror,typeerror are also generated).
Httperror is a subclass of Urlerror that is typically generated in a specific HTTP URL.
Urlerror
Typically, urlerror occurs without a network connection (not routed to a specific server), or if the server does not exist. In this case, the exception will also have the "reason" attribute, which is a tuple that contains an error number and an error message.
For example

Copy Code code as follows:

>>> req = urllib2. Request (' http://www.jb51.net ')
>>> Try:urllib2.urlopen (req)
>>> except Urlerror, E:
>>> Print E.reason
>>>
(4, ' getaddrinfo failed ')

Httperror
Each HTTP Reply object response on the server contains a number status code. Sometimes the status code indicates that the server cannot complete the request. The default processor will handle part of this response for you (for example, if response is a "redirect" that requires the client to get the document from a different address)
, URLLIB2 will handle it for you). Other urlopen that cannot be dealt with will produce a httperror. Typical errors include "404" (pages cannot be found), "403" (Request Prohibition), and "401" (with authentication requests).
See RFC 2616 in section tenth for all HTTP error codes
The Httperror instance is generated with an integer ' code ' attribute that is the associated error number sent by the server.
Error codes fault code
Because the default processor handles redirects (numbers other than 300), and 100-299-range numbers indicate success, you can see only 400-599 of the error numbers.
BaseHTTPServer.BaseHTTPRequestHandler.response is a useful dictionary of answer numbers that shows all the answer numbers used in RFC 2616. Here for the convenience of showing the dictionary again. (translator slightly)
When an error number is generated, the server returns an HTTP error number, and an error page. You can use the Httperror instance as the answer object returned by the page response. This represents the same as the error attribute, which also contains the Read,geturl, and the info method.

Copy Code code as follows:

>>> req = urllib2. Request (' http://www.python.org/fish.html ')
>>> Try:
>>> Urllib2.urlopen (req)
>>> except Urlerror, E:
>>> Print E.code
>>> Print E.read ()
>>>

Copy Code code as follows:

404
<! DOCTYPE HTML PUBLIC "-//w3c//dtd HTML 4.01 transitional//en"
"Http://www.w3.org/TR/html4/loose.dtd" >
<?xml-stylesheet href= "./css/ht2html.css"
Type= "Text/css"?>
... etc...

Wrapping it up Package
So if you want to prepare for httperror or urlerror, there will be two basic approaches. I like the second kind better.
The first one:

Copy Code code as follows:

From URLLIB2 import Request, Urlopen, Urlerror, Httperror
req = Request (Someurl)
Try
Response = Urlopen (req)
Except Httperror, E:
print ' The server couldn\ ' t fulfill the request. '
print ' Error code: ', E.code
Except Urlerror, E:
print ' We failed to reach a server. '
print ' Reason: ', E.reason
Else
# everything is fine

Note: Except Httperror must be in the first, otherwise except Urlerror will likewise receive httperror.
The second one:

Copy Code code as follows:

From URLLIB2 import Request, Urlopen, Urlerror
req = Request (Someurl)
Try
Response = Urlopen (req)
Except Urlerror, E:
If Hasattr (E, ' reason '):
print ' We failed to reach a server. '
print ' Reason: ', E.reason
Elif hasattr (E, ' Code '):
print ' The server couldn\ ' t fulfill the request. '
print ' Error code: ', E.code
Else
# everything is fine

Info and Geturl
Urlopen returns an Answer object response (or Httperror instance) has two very useful methods info () and Geturl ()
Geturl-This is useful as a real URL to get back, because Urlopen (or opener object) may
There will be redirects. The URL you get may be different from the request URL.
Info-This returns the object's Dictionary object, which describes the obtained page condition. Typically, the server sends a specific header headers. The present is httplib. Httpmessage instance.
The classic headers contains "Content-length", "Content-type", and others. View Quick Reference to HTTP Headers (http://www.cs.tut.fi/~jkorpela/http.html)
Gets a list of useful HTTP headers, as well as their explanatory meaning.
Openers and handlers
When you get a URL you use a opener (a urllib2. Examples of Openerdirector, Urllib2. Openerdirector may have been a bit confusing. Under normal circumstances, we
Using the default opener--through Urlopen, but you can create the personality of the openers,openers using the processor handlers, all the "heavy" work is handled by the handlers. Every handlers knows
How to open URLs through a specific protocol, or how to handle all aspects of the URL opening, such as HTTP redirection or HTTP cookies.
If you want to use a specific processor to get URLs you will want to create a openers, such as getting a opener that can handle cookies, or getting a opener that is not redirected.
To create a opener, instantiate a openerdirector, and then invoke the constant invocation of. Add_handler (some_handler_instance).
Again, you can use Build_opener, which is a more convenient function to create a opener object, and he only needs one function call at a time.
Build_opener adds several processors by default, but provides a quick way to add or update the default processor.
Other processor handlers you might want to process proxies, validations, and other common but somewhat special situations.
Install_opener is used to create (global) default opener. This means that calling Urlopen will use the opener you installed.
The opener object has an open method that can be used directly to obtain URLs like the Urlopen function: it is not usually necessary to invoke Install_opener, except for convenience.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.