There are many useful tool classes in the Python standard library, but it is not clear how to use the detail description on the standard library document, such as URLLIB2, which is the HTTP client library. Here is a summary of some of the URLLIB2 's usage details.
Settings for 1.Proxy
2.Timeout settings
3. Add a specific Header to the HTTP Request
4.Redirect
5.Cookie
6. PUT and DELETE methods using HTTP
7. Get the return code for HTTP
8.Debug Log
Settings for Proxy
URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP proxy by default. If you want to explicitly control the Proxy in the program without being affected by the environment variables, you can use the following method
The code is as follows:
Import Urllib2
Enable_proxy = True
Proxy_handler = Urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '})
Null_proxy_handler = Urllib2. Proxyhandler ({})
If Enable_proxy:
Opener = Urllib2.build_opener (Proxy_handler)
Else
Opener = Urllib2.build_opener (Null_proxy_handler)
Urllib2.install_opener (opener)
One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.
Timeout setting
In older Python, the urllib2 API does not expose the timeout setting, and to set the timeout value, you can only change the global timeout value of the Socket.
The code is as follows:
Import Urllib2
Import socket
Socket.setdefaulttimeout (10) # timeout after 10 seconds
Urllib2.socket.setdefaulttimeout (10) # Another way
After Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().
The code is as follows:
Import Urllib2
Response = Urllib2.urlopen (' http://www.google.com ', timeout=10)
Add a specific Header to the HTTP Request
To join the header, you need to use the Request object:
The code is as follows:
Import Urllib2
Request = Urllib2. Request (URI)
Request.add_header (' user-agent ', ' fake-client ')
Response = Urllib2.urlopen (Request)
Special attention should be paid to some of the headers, which are checked by the server.
User-agent: Some servers or proxies will use this value to determine whether a request is made by a browser
Content-type: When using the REST interface, the server checks the value to determine how the content in the HTTP Body should be parsed. The common values are:
Application/xml: Used in XML RPC, such as Restful/soap call
Application/json: Used in JSON RPC calls
Application/x-www-form-urlencoded: Used when a Web form is submitted by the browser
Content-type setting errors cause server denial of service when using RESTful or SOAP services provided by the server
Redirect
URLLIB2 automatically redirect actions for HTTP 3XX return codes by default, without manual configuration. To detect if a redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.
The code is as follows:
Import Urllib2
Response = Urllib2.urlopen (' http://www.google.cn ')
redirected = Response.geturl () = = ' http://www.google.cn '
If you do not want to automatically redirect, you can customize the Httpredirecthandler class in addition to the lower Httplib library.
The code is as follows:
Import Urllib2
Class Redirecthandler (Urllib2. Httpredirecthandler):
def http_error_301 (self, req, FP, code, MSG, headers):
Pass
def http_error_302 (self, req, FP, code, MSG, headers):
Pass
Opener = Urllib2.build_opener (Redirecthandler)
Opener.open (' http://www.google.cn ')
Cookies
Urllib2 the processing of cookies is also automatic. If you need to get the value of a Cookie entry, you can do this:
The code is as follows:
Import Urllib2
Import Cookielib
Cookie = Cookielib. Cookiejar ()
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
Response = Opener.open (' http://www.google.com ')
For item in Cookie:
if item.name = = ' Some_cookie_item_name ':
Print Item.value
PUT and DELETE methods using HTTP
URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to send a PUT or DELETE request in the following way:
The code is as follows:
Import Urllib2
Request = Urllib2. Request (URI, Data=data)
Request.get_method = lambda: ' PUT ' # or ' DELETE '
Response = Urllib2.urlopen (Request)
This approach, though belonging to the Hack, is not a problem in practical use.
Get the return code for HTTP
For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:
The code is as follows:
Import Urllib2
Try
Response = Urllib2.urlopen (' http://restrict.web.com ')
Except Urllib2. Httperror, E:
Print E.code
Debug Log
When using URLLIB2, the debug Log can be opened by the following method, so that the contents of the transceiver will be printed on the screen, easy to debug, sometimes save the job of grasping the package
The code is as follows:
Import Urllib2
HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)
Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')