There are a number of useful tools classes in the Python standard library, but when used specifically, the standard library documentation does not describe the details of the usage, such as URLLIB2 this HTTP client library. Here's a summary of some of the URLLIB2 's usage details.
setting of 1.Proxy
2.Timeout settings
3. Add a specific Header to the HTTP Request
4.Redirect
5.Cookie
6. Use the Put and DELETE method of HTTP
7. Get HTTP Return code
8.Debug Log
Settings for Proxy
URLLIB2 uses environment variable HTTP_PROXY to set HTTP proxy by default. If you want to explicitly control the Proxy in your program without being affected by the environment variables, you can use the following methods
Copy Code code as follows:
Import Urllib2
Enable_proxy = True
Proxy_handler = Urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '})
Null_proxy_handler = Urllib2. Proxyhandler ({})
If Enable_proxy:
Opener = Urllib2.build_opener (Proxy_handler)
Else
Opener = Urllib2.build_opener (Null_proxy_handler)
Urllib2.install_opener (opener)
One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This will be convenient to use later, but can not do finer granularity of control, such as to use two different Proxy settings in the program. It is a good practice to change the global setting without using Install_opener, instead of simply calling the opener open method instead of the global Urlopen method.
Timeout settings
In Legacy Python, the Urllib2 API does not expose Timeout settings, and to set the Timeout value, you can change only the global Timeout value of the Socket.
Copy Code code as follows:
Import Urllib2
Import socket
Socket.setdefaulttimeout (10) # 10 seconds after timeout
Urllib2.socket.setdefaulttimeout (10) # Another way
After Python 2.6, timeouts can be set directly through the timeout parameters of Urllib2.urlopen ().
Copy Code code as follows:
Import Urllib2
Response = Urllib2.urlopen (' http://www.google.com ', timeout=10)
Add a specific Header to the HTTP Request
To join the header, you need to use the Request object:
Copy Code code as follows:
Import Urllib2
Request = Urllib2. Request (URI)
Request.add_header (' user-agent ', ' fake-client ')
Response = Urllib2.urlopen (Request)
For some headers to pay special attention, the server will check for these headers
User-agent: Some servers or proxies will use this value to determine whether the browser is making a request
Content-type: When using the REST interface, the server checks the value to determine how the contents of the HTTP body are parsed. The common values are:
Application/xml: Used when XML RPC, such as a restful/soap call
Application/json: Used when JSON RPC calls
application/x-www-form-urlencoded: Use when browsers submit Web forms
When using a server-supplied RESTful or SOAP service, content-type Setup errors can cause the server to deny service
Redirect
URLLIB2 automatically redirect actions for HTTP 3XX return codes by default, without manual configuration. To detect if a redirect action has occurred, just check the Response url and the URL of the Request to be consistent.
Copy Code code as follows:
Import Urllib2
Response = Urllib2.urlopen (' http://www.google.cn ')
redirected = Response.geturl () = = ' http://www.google.cn '
If you do not want to automatically redirect, you can customize the Httpredirecthandler class in addition to using a lower-level httplib library.
Copy Code code as follows:
Import Urllib2
Class Redirecthandler (Urllib2. Httpredirecthandler):
def http_error_301 (self, req, FP, code, MSG, headers):
Pass
def http_error_302 (self, req, FP, code, MSG, headers):
Pass
Opener = Urllib2.build_opener (Redirecthandler)
Opener.open (' http://www.google.cn ')
Cookies
URLLIB2 processing of cookies is also automatic. If you need to get a value for a Cookie item, you can do this:
Copy Code code as follows:
Import Urllib2
Import Cookielib
Cookie = Cookielib. Cookiejar ()
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
Response = Opener.open (' http://www.google.com ')
For item in Cookie:
if item.name = = ' Some_cookie_item_name ':
Print Item.value
use the Put and DELETE methods of HTTP
URLLIB2 only supports HTTP GET and POST methods, and only the lower-level httplib libraries are used if you want to use HTTP put and DELETE. Nonetheless, we are able to enable URLLIB2 to send a put or DELETE request in the following way:
Copy Code code as follows:
Import Urllib2
Request = Urllib2. Request (URI, Data=data)
Request.get_method = lambda: ' Put ' # or ' DELETE '
Response = Urllib2.urlopen (Request)
Although this is a Hack way, it is not a problem to use in practice.
Get the return code for HTTP
For OK, the return code of HTTP can be obtained only by using the GetCode () method of the response object returned by Urlopen. But for other return codes, Urlopen throws an exception. At this point, you need to check the code attribute of the exception object:
Copy Code code as follows:
Import Urllib2
Try
Response = Urllib2.urlopen (' http://restrict.web.com ')
Except Urllib2. Httperror, E:
Print E.code
Debug Log
When using URLLIB2, you can open the debug Log in the following way, so that the contents of the packet will be printed on the screen, convenient debugging, and sometimes can save the work of grasping the bag
Copy Code code as follows:
Import Urllib2
HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)
Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')