Just use, this article is well written, turn around to collect. Reprinted from Tao Road | usage details of the Python standard library urllib2
There are many useful tool classes in the Python standard library, but it is not clear how to use the detail description on the standard library document, such as URLLIB2, which is the HTTP client library. Here is a summary of some of the URLLIB2 library usage details.
- 1 Proxy Settings
- 2 Timeout Settings
- 3 Adding a specific Header to the HTTP Request
- 4 Redirect
- 5 Cookies
- 6 PUT and DELETE methods using HTTP
- 7 Getting the return code for HTTP
- 8 Debug Log
1 Proxy Settings
URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP proxy by default. If you want to explicitly control the Proxy in the program, not affected by the environment variables, you can use the following method
Import Urllib2 enable_proxy = Trueproxy_handler = Urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '}) Null_proxy_handler = Urllib2. Proxyhandler ({}) if Enable_proxy: opener = Urllib2.build_opener (proxy_handler) Else: opener = Urllib2.build_ Opener (Null_proxy_handler) Urllib2.install_opener (opener) |
One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.
2 Timeout Settings
In the older version, the Urllib2 API did not expose the timeout setting, and to set the timeout value, only the global timeout value of the Socket could be changed.
Import Urllib2import Socket Socket.setdefaulttimeout (10) # 10 seconds after timeout Urllib2.socket.setdefaulttimeout (10) # Another way |
In the new version of Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().
Import urllib2response = Urllib2.urlopen (' http://www.google.com ', timeout=10) |
3 Adding a specific Header to the HTTP Request
To join the Header, you need to use the Request object:
Import URLLIB2 request = Urllib2. Request (URI) Request.add_header (' user-agent ', ' fake-client ') response = Urllib2.urlopen (Request) |
Special attention should be paid to some of the headers, which are checked against these headers by the server side.
4 Redirect
URLLIB2 automatically Redirect actions for 3xx HTTP return codes by default, without manual configuration. To detect if a Redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.
Import urllib2response = Urllib2.urlopen (' http://www.google.cn ') redirected = Response.geturl () = = ' http://www.google.cn ' |
If you do not want to automatically Redirect, you can use the custom Httpredirecthandler class in addition to the lower Httplib library.
Import Urllib2 class Redirecthandler (urllib2. Httpredirecthandler): def http_error_301 (self, req, FP, code, MSG, headers): pass def http_error_302 (self , req, FP, code, MSG, headers): Pass opener = Urllib2.build_opener (Redirecthandler) opener.open (' http://www.google.cn ') |
5 Cookies
Urllib2 the processing of cookies is also automatic. If you need to get the value of a Cookie entry, you can do this:
Import urllib2import Cookielib cookie = cookielib. Cookiejar () opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) response = Opener.open (' http://www.google.com ') for item in cookie: if item.name = = ' Some_cookie_item_name ': print Item.value |
6 PUT and DELETE methods using HTTP
URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to issue HTTP PUT or DELETE packages in the following way:
Import URLLIB2 request = Urllib2. Request (URI, data=data) Request.get_method = lambda: ' PUT ' # or ' DELETE ' response = Urllib2.urlopen (Request) |
This approach, though belonging to the Hack, is not a problem in practical use.
7 getting the return code for HTTP
For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:
Import Urllib2try: response = Urllib2.urlopen (' http://restrict.web.com ') except URLLIB2. Httperror, E: print E.code |
8 Debug Log
When using URLLIB2, you can use the following method to open the debug Log, so that the contents of the transceiver will be printed on the screen, convenient for us to debug, to a certain extent can eliminate the work of grasping the package.
Import Urllib2 HttpHandler = Urllib2. HttpHandler (debuglevel=1) Httpshandler = Urllib2. Httpshandler (debuglevel=1) opener = Urllib2.build_opener (HttpHandler, Httpshandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.google.com ') |
Use of the Python standard library URLLIB2