A summary of some usage details of the Python standard library urllib2

Source: Internet
Author: User
There are many useful tool classes in the Python standard library, but it is not clear how to use the detail description on the standard library document, such as URLLIB2, which is the HTTP client library. Here is a summary of some of the URLLIB2 's usage details.

Settings for 1.Proxy
2.Timeout settings
3. Add a specific Header to the HTTP Request
4.Redirect
5.Cookie
6. PUT and DELETE methods using HTTP
7. Get the return code for HTTP
8.Debug Log

Settings for Proxy

URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP proxy by default. If you want to explicitly control the Proxy in the program without being affected by the environment variables, you can use the following method

The code is as follows:


Import Urllib2
Enable_proxy = True
Proxy_handler = Urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '})
Null_proxy_handler = Urllib2. Proxyhandler ({})

If Enable_proxy:
Opener = Urllib2.build_opener (Proxy_handler)
Else
Opener = Urllib2.build_opener (Null_proxy_handler)

Urllib2.install_opener (opener)


One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.

Timeout setting

In older Python, the urllib2 API does not expose the timeout setting, and to set the timeout value, you can only change the global timeout value of the Socket.

The code is as follows:


Import Urllib2
Import socket
Socket.setdefaulttimeout (10) # timeout after 10 seconds
Urllib2.socket.setdefaulttimeout (10) # Another way


After Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().

The code is as follows:


Import Urllib2
Response = Urllib2.urlopen (' http://www.google.com ', timeout=10)


Add a specific Header to the HTTP Request

To join the header, you need to use the Request object:

The code is as follows:


Import Urllib2
Request = Urllib2. Request (URI)
Request.add_header (' user-agent ', ' fake-client ')
Response = Urllib2.urlopen (Request)


Special attention should be paid to some of the headers, which are checked by the server.

User-agent: Some servers or proxies will use this value to determine whether a request is made by a browser

Content-type: When using the REST interface, the server checks the value to determine how the content in the HTTP Body should be parsed. The common values are:

Application/xml: Used in XML RPC, such as Restful/soap call
Application/json: Used in JSON RPC calls
Application/x-www-form-urlencoded: Used when a Web form is submitted by the browser

Content-type setting errors cause server denial of service when using RESTful or SOAP services provided by the server

Redirect

URLLIB2 automatically redirect actions for HTTP 3XX return codes by default, without manual configuration. To detect if a redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.

The code is as follows:


Import Urllib2
Response = Urllib2.urlopen (' http://www.google.cn ')
redirected = Response.geturl () = = ' http://www.google.cn '

If you do not want to automatically redirect, you can customize the Httpredirecthandler class in addition to the lower Httplib library.

The code is as follows:


Import Urllib2

Class Redirecthandler (Urllib2. Httpredirecthandler):
def http_error_301 (self, req, FP, code, MSG, headers):
Pass
def http_error_302 (self, req, FP, code, MSG, headers):
Pass

Opener = Urllib2.build_opener (Redirecthandler)
Opener.open (' http://www.google.cn ')

Cookies

Urllib2 the processing of cookies is also automatic. If you need to get the value of a Cookie entry, you can do this:

The code is as follows:


Import Urllib2
Import Cookielib

Cookie = Cookielib. Cookiejar ()
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
Response = Opener.open (' http://www.google.com ')
For item in Cookie:
if item.name = = ' Some_cookie_item_name ':
Print Item.value


PUT and DELETE methods using HTTP

URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to send a PUT or DELETE request in the following way:

The code is as follows:


Import Urllib2

Request = Urllib2. Request (URI, Data=data)
Request.get_method = lambda: ' PUT ' # or ' DELETE '
Response = Urllib2.urlopen (Request)


This approach, though belonging to the Hack, is not a problem in practical use.

Get the return code for HTTP

For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:

The code is as follows:


Import Urllib2
Try
Response = Urllib2.urlopen (' http://restrict.web.com ')
Except Urllib2. Httperror, E:
Print E.code
Debug Log

When using URLLIB2, the debug Log can be opened by the following method, so that the contents of the transceiver will be printed on the screen, easy to debug, sometimes save the job of grasping the package

The code is as follows:


Import Urllib2

HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)

Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.