Summary of the usage details of the Python standard library urllib2, pythonurllib2

Source: Internet
Author: User

Summary of the usage details of the Python standard library urllib2, pythonurllib2

There are many practical tool classes in the Python standard library, but the detailed description of the use is not clear in the standard library documentation, such as the HTTP client library urllib2. Here we summarize the Usage Details of urllib2.

1. Proxy Settings
2. Timeout settings
3. Add a specific Header to the HTTP Request
4. Redirect
5. Cookie
6. Use the PUT and DELETE methods of HTTP
7. Get the HTTP return code
8. Debug Log

Proxy Settings

By default, urllib2 uses the environment variable http_proxy to set HTTP Proxy. If you want to explicitly control the Proxy in the program without being affected by environment variables, you can use the following method:

Copy codeThe Code is as follows:
Import urllib2
Enable_proxy = True
Proxy_handler = urllib2.ProxyHandler ({"http": 'http: // some-proxy.com: 100 '})
Null_proxy_handler = urllib2.ProxyHandler ({})
 
If enable_proxy:
Opener = urllib2.build _ opener (proxy_handler)
Else:
Opener = urllib2.build _ opener (null_proxy_handler)
 
Urllib2.install _ opener (opener)

Note that using urllib2.install _ opener () sets the global opener of urllib2. In this way, the subsequent use is very convenient, but it cannot be more fine-grained. For example, you want to use two different Proxy settings in the program. A better way is to directly call opener's open method instead of the global urlopen method.

Timeout settings

In the old version of Python, The urllib2 API does not expose the Timeout settings. To set the Timeout value, you can only change the global Timeout value of the Socket.

Copy codeThe Code is as follows:
Import urllib2
Import socket
Socket. setdefatimetimeout (10) # timeout after 10 seconds
Urllib2.socket. setdefatimetimeout (10) # Another Method

After Python 2.6, timeout can be directly set through the timeout parameter of urllib2.urlopen.

Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.com ', timeout = 10)

Add a specific Header to the HTTP Request

To add a header, you must use the Request object:
Copy codeThe Code is as follows:
Import urllib2
Request = urllib2.Request (uri)
Request. add_header ('user-agent', 'fake-client ')
Response = urllib2.urlopen (request)

Pay special attention to some headers. The server will check these headers.

User-Agent: Some servers or proxies use this value to determine whether the request is sent by the browser.

Content-Type: When the REST interface is used, the server checks the value to determine how to parse the Content in the HTTP Body. Common values include:

Application/xml: Used in xml rpc, such as RESTful/SOAP calls
Application/json: used for json rpc calls
Application/x-www-form-urlencoded: used when the browser submits a Web form

When using the RESTful or SOAP service provided by the server, the Content-Type setting error may cause the server to reject the service.

Redirect

By default, urllib2 automatically performs a redirect action on the HTTP 3XX return code, without manual configuration. To check whether a redirect action has occurred, you only need to check whether the Response URL and Request URL are consistent.

Copy codeThe Code is as follows:
Import urllib2
Response = urllib2.urlopen ('HTTP: // www.google.cn ')
Redirected = response. geturl () = 'HTTP: // www.google.cn'

If you do not want automatic redirect, you can customize the HTTPRedirectHandler class in addition to using the lower-level httplib library.

Copy codeThe Code is as follows:
Import urllib2
 
Class RedirectHandler (urllib2.HTTPRedirectHandler ):
Def http_error_301 (self, req, fp, code, msg, headers ):
Pass
Def http_error_302 (self, req, fp, code, msg, headers ):
Pass
 
Opener = urllib2.build _ opener (RedirectHandler)
Opener. open ('HTTP: // www.google.cn ')

Cookie

Urllib2 automatically processes cookies. To obtain the value of a Cookie, you can do this:

Copy codeThe Code is as follows:
Import urllib2
Import cookielib
 
Cookie = cookielib. CookieJar ()
Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cookie ))
Response = opener. open ('HTTP: // www.google.com ')
For item in cookie:
If item. name = 'some _ cookie_item_name ':
Print item. value

Use the PUT and DELETE methods of HTTP

Urllib2 only supports http get and POST methods. To use http put and DELETE methods, you can only use httplib libraries of lower layers. Even so, we can make urllib2 send a PUT or DELETE request in the following way:

Copy codeThe Code is as follows:
Import urllib2
 
Request = urllib2.Request (uri, data = data)
Request. get_method = lambda: 'put' # or 'delete'
Response = urllib2.urlopen (request)

Although this method belongs to the Hack method, it is no problem in actual use.

Get the HTTP return code

For 200 OK, you only need to use the getcode () method of the response object returned by urlopen to obtain the HTTP return code. However, for other return codes, urlopen throws an exception. At this time, we need to check the code attribute of the exception object:

Copy codeThe Code is as follows:
Import urllib2
Try:
Response = urllib2.urlopen ('HTTP: // restrict.web.com ')
Failed t urllib2.HTTPError, e:
Print e. code
Debug Log

When using urllib2, you can use the following method to open the debug Log, so that the content of the packet sending and receiving will be printed on the screen for debugging convenience. Sometimes you can save the packet capture work.

Copy codeThe Code is as follows:
Import urllib2
 
HttpHandler = urllib2.HTTPHandler (debuglevel = 1)
HttpsHandler = urllib2.HTTPSHandler (debuglevel = 1)
Opener = urllib2.build _ opener (httpHandler, httpsHandler)
 
Urllib2.install _ opener (opener)
Response = urllib2.urlopen ('HTTP: // www.google.com ')

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.