Before you start, explain the two methods in Urllib2: Info/geturl
Urlopen returns an Answer object response (or Httperror instance) has two very useful methods info () and Geturl ()
1.geturl ():
This returns the real URL obtained, which is useful because the Urlopen (or the opener object) may be redirected. The URL you get may be different from the request URL.
As an example of a super link in everyone,
Let's build a urllib2_test10.py to compare the original URL and redirect links:
Copy Code code as follows:
From URLLIB2 import Request, Urlopen, Urlerror, Httperror
Old_url = ' Http://rrurl.cn/b1UZuP '
req = Request (old_url)
Response = Urlopen (req)
print ' old URL: ' + old_url
print ' Real URL: ' + response.geturl ()
After running, you can see the URL that the real link points to:
2.info ():
This returns the object's Dictionary object, which describes the obtained page condition. Typically, the server sends a specific header headers. The present is httplib. Httpmessage instance.
The classic headers contains "Content-length", "Content-type", and other content.
Let's build a urllib2_test11.py to test the application of info:
Copy Code code as follows:
From URLLIB2 import Request, Urlopen, Urlerror, Httperror
Old_url = ' http://www.baidu.com '
req = Request (old_url)
Response = Urlopen (req)
print ' Info (): '
Print Response.info ()
The results of the run are as follows, and you can see the information about the page:
Here are two important concepts in URLLIB2: openers and handlers.
1.Openers:
When you get a URL you use a opener (a urllib2. instance of Openerdirector).
Normally, we use the default opener: through Urlopen.
But you can create the openers of individuality.
2.Handles:
Openers uses processor handlers, all "heavy" work is handled by handlers.
Each handlers knows how to open URLs through a specific protocol, or how to handle various aspects of the URL opening.
such as HTTP redirection or HTTP cookies.
If you want to use a specific processor to get URLs you will want to create a openers, such as getting a opener that can handle cookies, or getting a opener that is not redirected.
To create a opener, you can instantiate a openerdirector,
Then call. Add_handler (some_handler_instance).
Again, you can use Build_opener, which is a more convenient function to create a opener object, and he only needs one function call at a time.
Build_opener adds several processors by default, but provides a quick way to add or update the default processor.
Other processor handlers you might want to process proxies, validations, and other common but somewhat special situations.
Install_opener is used to create (global) default opener. This means that calling Urlopen will use the opener you installed.
The opener object has an open method.
This method can be used directly to obtain URLs like the Urlopen function: it is not usually necessary to invoke Install_opener, except for convenience.
Having said the above two contents, let's take a look at the content of the Basic authentication, here will use the opener and handler mentioned above.
Basic Authentication Verification
To demonstrate the creation and installation of a handler, we will use Httpbasicauthhandler.
When basic authentication is required, the server sends a header (401 error code) to request authentication. This specifies SCHEME and a ' realm ' that looks like this: Www-authenticate:scheme realm= "Realm".
For example
Www-authenticate:basic realm= "Cpanel Users"
The client must use the new request and include the correct name and password in the request header.
This is "Basic validation", and in order to simplify the process, we can create a Httpbasicauthhandler instance and let opener use the handler.
Httpbasicauthhandler uses a password-managed object to process URLs and realms to map user names and passwords.
If you know what the realm (the head from the server) is, you can use Httppasswordmgr.
Usually people don't care what realm is. In that case, you can use the convenient httppasswordmgrwithdefaultrealm.
This will specify a default user name and password for the URL.
This will be provided when you provide a different combination for a particular realm.
We indicate this by specifying none for the realm parameter to Add_password.
The highest level of URLs is the first one to require authentication. You pass to. Add_password () a deeper URL would be equally appropriate.
Having said so much nonsense, let's use an example to illustrate what is mentioned above.
Let's build a urllib2_test12.py to test the application of info:
Copy Code code as follows:
#-*-Coding:utf-8-*-
Import Urllib2
# Create a password manager
Password_mgr = Urllib2. Httppasswordmgrwithdefaultrealm ()
# Add user name and password
Top_level_url = "http://example.com/foo/"
# If you know realm, we can use him instead of ' None '.
# Password_mgr.add_password (None, top_level_url, username, password)
Password_mgr.add_password (None, Top_level_url, ' Why ', ' 1223 ')
# created a new handler
Handler = Urllib2. Httpbasicauthhandler (Password_mgr)
# create ' opener ' (Openerdirector instance)
Opener = Urllib2.build_opener (handler)
A_url = ' http://www.baidu.com/'
# Use opener to get a URL
Opener.open (A_url)
# Install opener.
# now all call Urllib2.urlopen will use our opener.
Urllib2.install_opener (opener)
Note: The above examples we only provide our hhtpbasicauthhandler to Build_opener.
The default openers has a normal condition of handlers:proxyhandler,unknownhandler,httphandler,httpdefaulterrorhandler, HTTPRedirectHandler, Ftphandler, Filehandler, Httperrorprocessor.
The Top_level_url in the code can actually be a full URL (including "http:" and the host name and optional port number).
For example: http://example.com/.
It can also be a "authority" (that is, host name and optional include port number).
For example: "example.com" or "example.com:8080".
The latter contains the port number.