Summary of the usage details of the Python standard library urllib2, pythonurllib2
There are many practical tool classes in the Python standard library, but the detailed description of the use is not clear in the standard library documentation, such as the HTTP client library urllib2. Here we summarize the Usage Details of urllib2.
1. Proxy Settings
2. Timeout settings
3. Add a specific Header to the HTTP Request
4. Redirect
5. Cookie
6. Use the PUT and DELETE methods of HTTP
7. Get the HTTP return code
8. Debug Log
Proxy Settings
By default, urllib2 uses the environment variable http_proxy to set HTTP Proxy. If you want to explicitly control the Proxy in the program without being affected by environment variables, you can use the following method:
Copy codeThe Code is as follows:
Import urllib2
Enable_proxy = True
Proxy_handler = urllib2.ProxyHandler ({"http": 'http: // some-proxy.com: 100 '})
Null_proxy_handler = urllib2.ProxyHandler ({})
If enable_proxy:
Opener = urllib2.build _ opener (proxy_handler)
Else:
Opener = urllib2.build _ opener (null_proxy_handler)
Urllib2.install _ opener (opener)
Note that using urllib2.install _ opener () sets the global opener of urllib2. In this way, the subsequent use is very convenient, but it cannot be more fine-grained. For example, you want to use two different Proxy settings in the program. A better way is to directly call opener's open method instead of the global urlopen method.
Timeout settings
In the old version of Python, The urllib2 API does not expose the Timeout settings. To set the Timeout value, you can only change the global Timeout value of the Socket.
Copy codeThe Code is as follows:
Import urllib2
Import socket
Socket. setdefatimetimeout (10) # timeout after 10 seconds
Urllib2.socket. setdefatimetimeout (10) # Another Method
After Python 2.6, timeout can be directly set through the timeout parameter of urllib2.urlopen.
Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.com ', timeout = 10)
Add a specific Header to the HTTP Request
To add a header, you must use the Request object:
Copy codeThe Code is as follows:
Import urllib2
Request = urllib2.Request (uri)
Request. add_header ('user-agent', 'fake-client ')
Response = urllib2.urlopen (request)
Pay special attention to some headers. The server will check these headers.
User-Agent: Some servers or proxies use this value to determine whether the request is sent by the browser.
Content-Type: When the REST interface is used, the server checks the value to determine how to parse the Content in the HTTP Body. Common values include:
Application/xml: Used in xml rpc, such as RESTful/SOAP calls
Application/json: used for json rpc calls
Application/x-www-form-urlencoded: used when the browser submits a Web form
When using the RESTful or SOAP service provided by the server, the Content-Type setting error may cause the server to reject the service.
Redirect
By default, urllib2 automatically performs a redirect action on the HTTP 3XX return code, without manual configuration. To check whether a redirect action has occurred, you only need to check whether the Response URL and Request URL are consistent.
Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.cn ')
Redirected = response. geturl () = 'HTTP: // www.google.cn'
If you do not want automatic redirect, you can customize the HTTPRedirectHandler class in addition to using the lower-level httplib library.
Copy codeThe Code is as follows:
Import urllib2
Class RedirectHandler (urllib2.HTTPRedirectHandler ):
Def http_error_301 (self, req, fp, code, msg, headers ):
Pass
Def http_error_302 (self, req, fp, code, msg, headers ):
Pass
Opener = urllib2.build _ opener (RedirectHandler)
Opener. open ('HTTP: // www.google.cn ')
Cookie
Urllib2 automatically processes cookies. To obtain the value of a Cookie, you can do this:
Copy codeThe Code is as follows:
Import urllib2
Import cookielib
Cookie = cookielib. CookieJar ()
Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cookie ))
Response = opener. open ('HTTP: // www.google.com ')
For item in cookie:
If item. name = 'some _ cookie_item_name ':
Print item. value
Use the PUT and DELETE methods of HTTP
Urllib2 only supports http get and POST methods. To use http put and DELETE methods, you can only use httplib libraries of lower layers. Even so, we can make urllib2 send a PUT or DELETE request in the following way:
Copy codeThe Code is as follows:
Import urllib2
Request = urllib2.Request (uri, data = data)
Request. get_method = lambda: 'put' # or 'delete'
Response = urllib2.urlopen (request)
Although this method belongs to the Hack method, it is no problem in actual use.
Get the HTTP return code
For 200 OK, you only need to use the getcode () method of the response object returned by urlopen to obtain the HTTP return code. However, for other return codes, urlopen throws an exception. At this time, we need to check the code attribute of the exception object:
Copy codeThe Code is as follows:
Import urllib2
Try:
Response = urllib2.urlopen ('HTTP: // restrict.web.com ')
Failed t urllib2.HTTPError, e:
Print e. code
Debug Log
When using urllib2, you can use the following method to open the debug Log, so that the content of the packet sending and receiving will be printed on the screen for debugging convenience. Sometimes you can save the packet capture work.
Copy codeThe Code is as follows:
Import urllib2
HttpHandler = urllib2.HTTPHandler (debuglevel = 1)
HttpsHandler = urllib2.HTTPSHandler (debuglevel = 1)
Opener = urllib2.build _ opener (httpHandler, httpsHandler)
Urllib2.install _ opener (opener)
Response = urllib2.urlopen ('HTTP: // www.google.com ')