Summary of the usage details of the Python standard library urllib2, pythonurllib2

Last Update:2015-03-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many practical tool classes in the Python standard library, but the detailed description of the use is not clear in the standard library documentation, such as the HTTP client library urllib2. Here we summarize the Usage Details of urllib2.

1. Proxy Settings
2. Timeout settings
3. Add a specific Header to the HTTP Request
4. Redirect
5. Cookie
6. Use the PUT and DELETE methods of HTTP
7. Get the HTTP return code
8. Debug Log

Proxy Settings

By default, urllib2 uses the environment variable http_proxy to set HTTP Proxy. If you want to explicitly control the Proxy in the program without being affected by environment variables, you can use the following method:

Copy codeThe Code is as follows:
Import urllib2
Enable_proxy = True
Proxy_handler = urllib2.ProxyHandler ({"http": 'http: // some-proxy.com: 100 '})
Null_proxy_handler = urllib2.ProxyHandler ({})

If enable_proxy:
Opener = urllib2.build _ opener (proxy_handler)
Else:
Opener = urllib2.build _ opener (null_proxy_handler)

Urllib2.install _ opener (opener)

Note that using urllib2.install _ opener () sets the global opener of urllib2. In this way, the subsequent use is very convenient, but it cannot be more fine-grained. For example, you want to use two different Proxy settings in the program. A better way is to directly call opener's open method instead of the global urlopen method.

Timeout settings

In the old version of Python, The urllib2 API does not expose the Timeout settings. To set the Timeout value, you can only change the global Timeout value of the Socket.

Copy codeThe Code is as follows:
Import urllib2
Import socket
Socket. setdefatimetimeout (10) # timeout after 10 seconds
Urllib2.socket. setdefatimetimeout (10) # Another Method

After Python 2.6, timeout can be directly set through the timeout parameter of urllib2.urlopen.

Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.com ', timeout = 10)

Add a specific Header to the HTTP Request

To add a header, you must use the Request object:
Copy codeThe Code is as follows:
Import urllib2
Request = urllib2.Request (uri)
Request. add_header ('user-agent', 'fake-client ')
Response = urllib2.urlopen (request)

Pay special attention to some headers. The server will check these headers.

User-Agent: Some servers or proxies use this value to determine whether the request is sent by the browser.

Content-Type: When the REST interface is used, the server checks the value to determine how to parse the Content in the HTTP Body. Common values include:

Application/xml: Used in xml rpc, such as RESTful/SOAP calls
Application/json: used for json rpc calls
Application/x-www-form-urlencoded: used when the browser submits a Web form

When using the RESTful or SOAP service provided by the server, the Content-Type setting error may cause the server to reject the service.

Redirect

By default, urllib2 automatically performs a redirect action on the HTTP 3XX return code, without manual configuration. To check whether a redirect action has occurred, you only need to check whether the Response URL and Request URL are consistent.

Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.cn ')
Redirected = response. geturl () = 'HTTP: // www.google.cn'

If you do not want automatic redirect, you can customize the HTTPRedirectHandler class in addition to using the lower-level httplib library.

Copy codeThe Code is as follows:
Import urllib2

Class RedirectHandler (urllib2.HTTPRedirectHandler ):
Def http_error_301 (self, req, fp, code, msg, headers ):
Pass
Def http_error_302 (self, req, fp, code, msg, headers ):
Pass

Opener = urllib2.build _ opener (RedirectHandler)
Opener. open ('HTTP: // www.google.cn ')

Cookie

Urllib2 automatically processes cookies. To obtain the value of a Cookie, you can do this:

Copy codeThe Code is as follows:
Import urllib2
Import cookielib

Cookie = cookielib. CookieJar ()
Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cookie ))
Response = opener. open ('HTTP: // www.google.com ')
For item in cookie:
If item. name = 'some _ cookie_item_name ':
Print item. value

Use the PUT and DELETE methods of HTTP

Urllib2 only supports http get and POST methods. To use http put and DELETE methods, you can only use httplib libraries of lower layers. Even so, we can make urllib2 send a PUT or DELETE request in the following way:

Copy codeThe Code is as follows:
Import urllib2

Request = urllib2.Request (uri, data = data)
Request. get_method = lambda: 'put' # or 'delete'
Response = urllib2.urlopen (request)

Although this method belongs to the Hack method, it is no problem in actual use.

Get the HTTP return code

For 200 OK, you only need to use the getcode () method of the response object returned by urlopen to obtain the HTTP return code. However, for other return codes, urlopen throws an exception. At this time, we need to check the code attribute of the exception object:

Copy codeThe Code is as follows:
Import urllib2
Try:
Response = urllib2.urlopen ('HTTP: // restrict.web.com ')
Failed t urllib2.HTTPError, e:
Print e. code
Debug Log

When using urllib2, you can use the following method to open the debug Log, so that the content of the packet sending and receiving will be printed on the screen for debugging convenience. Sometimes you can save the packet capture work.

Copy codeThe Code is as follows:
Import urllib2

HttpHandler = urllib2.HTTPHandler (debuglevel = 1)
HttpsHandler = urllib2.HTTPSHandler (debuglevel = 1)
Opener = urllib2.build _ opener (httpHandler, httpsHandler)

Urllib2.install _ opener (opener)
Response = urllib2.urlopen ('HTTP: // www.google.com ')

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of the usage details of the Python standard library urllib2, pythonurllib2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of the usage details of the Python standard library urllib2, pythonurllib2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support