With Python's Urllib2 module, you can easily simulate the behavior of a user accessing a webpage.
This is a simple record of your own learning process.
First, Urlopen function
Urlopen (URL, data=none)--Basic usage is the same as original
Urllib. Pass the URL and optionally data to post to an HTTP URL, and
Get a File-like object back. One difference is so you can also pass
A Request instance instead of URL. Raises a urlerror (subclass of
IOERROR); For HTTP errors, raises a httperror, which can also be
Treated as a valid response.
Its basic usage is the same as the usage in the Urllib library. The notes for Urlopen in Urllib are as follows:
Urlopen (URL, data=none, Proxies=none)
Create a File-like object for the specified URL to read from.
But unlike Urllib, the first parameter URL of the Urlopen function in URLLIB2 can be a request instance.
1. Basic usage
Example:
Usage of the Urlopen function in the #等同urllib in [a]: Response = Urllib2.urlopen (' http://www.baidu.com ') in []: Response.read () # URLLIB2 Usage of the request instance in []: request = Urllib2. Request (' http://www.baidu.com ') in [+]: Response = Urllib2.urlopen (Request) in [+]: Response.read ()
I still like the second way of use here. After all, an HTTP request must be requested first, before the response can exist. In this way, the idea of programming is quite clear. The code reads clearly as well.
2. Analog POST Request
All of the above-simulated requests are all get-way requests, so what if we need to simulate a post-mode request?
Check the Help for request (URLLIB2. Request), its __init__ constructor is declared like this
__init__ (self, URL, Data=none, headers={}, Origin_req_host=none, Unverifiable=false)
From the declaration, the post data can be put into data, and we can also set the HTTP request header parameters via headers
Example:
Import Urllibimport Urllib2 values = {}values[' username '] = "God" values[' password '] = "XXXX" data = Urllib.urlencode (value s) # uses the UrlEncode method URL in the Urllib library = "Http://xxxx.xxxxx/login" request = Urllib2. Request (url,data) response = Urllib2.urlopen (request) print response.read ()
You can change your URLs, username and password for specific scenarios.
3. Set the HTTP request header
Try the headers parameter to modify some information about the HTTP request header. Make a slight change in the previous example
Import Urllibimport Urllib2 values = {}values[' username '] = "God" values[' password '] = "XXXX" data = Urllib.urlencode (value s) url = "Http://xxxx.xxxxx/login" headers = {' user-agent ': ' ozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) gecko/20100101 firefox/37.0 ', ' content-type ': ' text/html; Charset=utf-8 ', ' Referer ': ' http://www.baidu.com/'}request = urllib2. Request (url,data,headers) response = Urllib2.urlopen (request) print response.read ()
More header information can be found through the F12 function provided by the browser.
4. Set Request Timeout
Many times there are various reasons that may lead to your request for various waits. It's time to test your patience, but this can be done by setting a timeout in the urlopen to get rid of the long-time, unresponsive requests we can't tolerate.
Urlopen (URL, data=none, Timeout=<object object>)
One thing to note when using timeout is that if you do not have data, then you must show the pass parameters.
Example:
Import urllib2urllib2.urlopen (' http://www.baidu.com ', data,10) urllib2.urlopen (' http://www.baidu.com ', timeout=10)
Second, opener (Openerdirector)
The Openerdirector manages a collection of Handler objects that does
All the actual work. Each Handler implements a particular protocol or
Option. The Openerdirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HttpHandler performs HTTP GET and POST requests and deals with
Non-error returns. The Httpredirecthandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the Httpdigestauthhandler
Deals with Digest authentication
What do you use it for? Manages a series of handler objects. I understand that, in fact, when we use Urlopen, there is already a default handler. It's only transparent to us. We can use this handler to make get/post requests, but what if we want to do something else? If we want to set up an agent to do something and so on all non-get/post can handle it well. Then we need to change the handler. This is the time to use opener, which is what opener can do.
1, set up the agent
Import Urllib2proxy_handler = Urllib2. Proxyhandler ({"http": ' http://11.11.11.11:8080 '}) opener = Urllib2.build_opener (Proxy_handler) Urllib2.install_ Opener (opener) response = Urllib2.urlopen (' http://xxx.xxx.xxxx ') response.read ()
2. Open the Debug log function for HTTP and HTTPS
Import Urllib2httphandler = Urllib2. HttpHandler (debuglevel=1) Httpshandler = Urllib2. Httpshandler (debuglevel=1) opener = Urllib2.build_opener (HttpHandler, Httpshandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.baidu.com ')
3. Processing cookie information in conjunction with Cookielib
First of all, a simple look at cookielib this module, the function is very powerful. It's better to study carefully.
Here we only study opener related, temporarily skip Cookielib module
Import Urllib2import Cookielibcookie = Cookielib. Cookiejar () cookiehandler=urllib2. Httpcookieprocessor (cookie) opener = Urllib2.build_opener (Cookiehandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.baidu.com ') for item in cookie:print ' CookieName = ' +item.name print ' cookievalue = ' +it Em.value
Third, exception handling Urlerror and Httperror
Httperror is a sub-class of Urlerror
Urlerror
Httperror (Urlerror, Urllib.addinfourl)
Import Urllib2 req = Urllib2. Request (' Http://www.baidu.com/mmmaa ') try:urllib2.urlopen (req) except URLLIB2. Httperror, E:if hasattr (E, "code"): Print e.codeexcept urllib2. Urlerror, E:if hasattr (E, "Reason"): Print E.reasonelse:print "OK"
This article is from the "Learning Notes" blog, so be sure to keep this source http://unixman.blog.51cto.com/10163040/1654727
The Urllib2 module in Python