A summary of some usage details of the Python standard library urllib2

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many useful tool classes in the Python standard library, but it is not clear how to use the detail description on the standard library document, such as URLLIB2, which is the HTTP client library. Here is a summary of some of the URLLIB2 's usage details.

Settings for 1.Proxy
2.Timeout settings
3. Add a specific Header to the HTTP Request
4.Redirect
5.Cookie
6. PUT and DELETE methods using HTTP
7. Get the return code for HTTP
8.Debug Log

Settings for Proxy

URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP proxy by default. If you want to explicitly control the Proxy in the program without being affected by the environment variables, you can use the following method

The code is as follows:

Import Urllib2
Enable_proxy = True
Proxy_handler = Urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '})
Null_proxy_handler = Urllib2. Proxyhandler ({})

If Enable_proxy:
Opener = Urllib2.build_opener (Proxy_handler)
Else
Opener = Urllib2.build_opener (Null_proxy_handler)

Urllib2.install_opener (opener)

One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.

Timeout setting

In older Python, the urllib2 API does not expose the timeout setting, and to set the timeout value, you can only change the global timeout value of the Socket.

The code is as follows:

Import Urllib2
Import socket
Socket.setdefaulttimeout (10) # timeout after 10 seconds
Urllib2.socket.setdefaulttimeout (10) # Another way

After Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().

The code is as follows:

Import Urllib2
Response = Urllib2.urlopen (' http://www.google.com ', timeout=10)

Add a specific Header to the HTTP Request

To join the header, you need to use the Request object:

The code is as follows:

Import Urllib2
Request = Urllib2. Request (URI)
Request.add_header (' user-agent ', ' fake-client ')
Response = Urllib2.urlopen (Request)

Special attention should be paid to some of the headers, which are checked by the server.

User-agent: Some servers or proxies will use this value to determine whether a request is made by a browser

Content-type: When using the REST interface, the server checks the value to determine how the content in the HTTP Body should be parsed. The common values are:

Application/xml: Used in XML RPC, such as Restful/soap call
Application/json: Used in JSON RPC calls
Application/x-www-form-urlencoded: Used when a Web form is submitted by the browser

Content-type setting errors cause server denial of service when using RESTful or SOAP services provided by the server

Redirect

URLLIB2 automatically redirect actions for HTTP 3XX return codes by default, without manual configuration. To detect if a redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.

The code is as follows:

Import Urllib2
Response = Urllib2.urlopen (' http://www.google.cn ')
redirected = Response.geturl () = = ' http://www.google.cn '

If you do not want to automatically redirect, you can customize the Httpredirecthandler class in addition to the lower Httplib library.

The code is as follows:

Import Urllib2

Class Redirecthandler (Urllib2. Httpredirecthandler):
def http_error_301 (self, req, FP, code, MSG, headers):
Pass
def http_error_302 (self, req, FP, code, MSG, headers):
Pass

Opener = Urllib2.build_opener (Redirecthandler)
Opener.open (' http://www.google.cn ')

Cookies

Urllib2 the processing of cookies is also automatic. If you need to get the value of a Cookie entry, you can do this:

The code is as follows:

Import Urllib2
Import Cookielib

Cookie = Cookielib. Cookiejar ()
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
Response = Opener.open (' http://www.google.com ')
For item in Cookie:
if item.name = = ' Some_cookie_item_name ':
Print Item.value

PUT and DELETE methods using HTTP

URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to send a PUT or DELETE request in the following way:

The code is as follows:

Import Urllib2

Request = Urllib2. Request (URI, Data=data)
Request.get_method = lambda: ' PUT ' # or ' DELETE '
Response = Urllib2.urlopen (Request)

This approach, though belonging to the Hack, is not a problem in practical use.

Get the return code for HTTP

For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:

The code is as follows:

Import Urllib2
Try
Response = Urllib2.urlopen (' http://restrict.web.com ')
Except Urllib2. Httperror, E:
Print E.code
Debug Log

When using URLLIB2, the debug Log can be opened by the following method, so that the contents of the transceiver will be printed on the screen, easy to debug, sometimes save the job of grasping the package

The code is as follows:

Import Urllib2

HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)

Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A summary of some usage details of the Python standard library urllib2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A summary of some usage details of the Python standard library urllib2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support