Python's Urllib2 library detailed usage instructions

Last Update:2014-10-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

All along the technical group will have new students to ask questions about Urllib and URLLIB2 and cookielib related issues. So I'm going to summarize here and avoid wasting resources by answering the same questions over and over again.

This is a tutorial class text, if you already know urllib2 and cookielib so please ignore this article.

First, start with a piece of code,

#CookiesImportUrllib2ImportCookielibcookie=Cookielib. Cookiejar () opener=Urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) Request= Urllib2. Request (url='http://www.baidu.com/') Request.add_header ('user-agent','mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1') Response=Opener.open (Request) forIteminchCookies:PrintItem.value

Many students say, I can write the URL directly in Openr.open (), why you need to use request. In fact, I wrote this to tidy up the common steps used to construct a request using URLLIB2.

Preliminary, the URLLIB2 constructs the request common steps (combined with the above code):

1, Handler

Handler Urllib2.build_opener (handler), the following are the official common handler

Urllib2. HttpHandler () Open URL via http
Urllib2. Cacheftphandler () FTP handler with persistent FTP connection
Urllib2. Filehandler () to open a local file
Urllib2. Ftphandler () Open URL via ftp
Urllib2. Httpbasicauthhandler () processing via HTTP authentication
Urllib2. Httpcookieprocessor () Processing HTTP cookies
Urllib2. Httpdefaulterrorhandler () Handling HTTP errors by throwing Httperror exceptions
Urllib2. Httpdigestauthhandler () HTTP Digest validation processing
Urllib2. Httpredirecthandler () Handling HTTP redirection
Urllib2. Httpshandler () via Secure HTTP redirection
Urllib2. Proxyhandler () Redirect request via Proxy
Urllib2. Proxybasicauthhandler Basic Proxy Authentication
Urllib2. Proxydigestauthhandler Digest Agent Validation
Urllib2. Unknownhandler processing of all unknown URLs

2. Request

Request=urllib2. Request (url= ")

Request.add_data (data) If the request is HTTP, the method changes to post. Note that this method does not append data to any of the previously-set settings, but instead uses the current data to replace the previous
Request.add_header (Key,val) key is the header name, Val is the header value, two parameters are string
Request.add_unredirected_header (Key,val) Ibid, but not added to redirect request
Request.set_proxy (Host,type) prepares the request to the server. Replace the original host with host and replace the original request type with type

3, opener

The basic Urlopen () function does not support authentication, cookies, or other advanced HTTP features. To support these features, you must use the Build_opener () function to create your own custom opener object

To build your own custom opener object, there are usually two ways of doing it:

Opener=urllib2. Openerdirector ()
Opener.add_handler (Handler)

Opener=urllib2. Openerdirector ()
Urllib2.build_opener (Handler)
Install_opener (opener)

Installing opener as the global URL opener used by Urlopen () means that the installed opener object will be used when Urlopen () is called later. Opener is typically the opener object created by Build_opener ().

4, Content_stream

Content_stream=opener.open (Request)

5, Content_stream.read ()

With the above 5 steps, you can get the code similar to the one at the beginning of this article. This completes the construction of a URLLIB2 basic usage pattern. You can also encapsulate the above 5 steps into a class, but I don't think it's a very brief introduction.

The URLLIB2 module can not only use the Urlopen () function but also customize the opener to access the Web page
Note, however, that the Urlretrieve () function is in the Urllib module, and the function does not exist in the URLLIB2 module. However, using the URLLIB2 module is generally inseparable from the Urllib module, because the post data needs to be encoded using the Urllib.urlencode () function.

Advanced, Urllib2 more use details:

1. Proxy settings

 import   Urllib2enable_proxy  =trueproxy_handler  =urllib2. Proxyhandler ({   ' :  '  http://some-proxy.com:8080   "  = Urllib2.    Proxyhandler ({})  if   Enable_proxy: Opener  = Urllib2.build_opener (Proxy_handler)  else   = Urllib2.build_opener (null_proxy_ Handler) Urllib2.install_opener (opener)

PS: Use Urllib2.install_opener () to set URLLIB2 global opener. The use of the latter will be convenient, but can not be more detailed control, if you want to use the program two different proxy settings. The better way is not to apply install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.

2. Timeout setting

# < py2.6 Import Urllib2 Import Socketsocket.setdefaulttimeoust (#One-urllib2.socket.setdefaulttimeout)#  anther- >=py2.6import = Urllib2.urlopen ('  http://www.google.com', timeout=10)

3. Add a specific header to the HTTP request

To join the header, you need to use the Request object:

Import= urllib2. Request (URL) request.add_header ('user-agent','fake-client  '= urllib2.urlopen (Request)

For some headers to pay special attention, the server will check for these headers:
User-agent: Some servers or proxies will use this value to determine whether a request is made by a browser
Content-type: When using the rest interface, the server checks the value to determine how the content in the HTTP body should be parsed. The common values are:
Application/xml used in XML RPC, such as Restful/soap calls
Application/json used in JSON RPC calls
application/x-www-form-urlencoded when a Web form is submitted by the browser

4, Redirect

URLLIB2 automatically redirect actions for HTTP 3xx return codes by default, without manual configuration. To detect whether a redirect action has occurred, just check the URL of the response and the URL of the request is always available.

Import= Urllib2.urlopen ('http://www.g.cn'http:/ /www.google.cn'

If you do not want to redirect automatically, you can customize the Httpredirecthandler class in addition to using the more Stratum httplib library unexpectedly.

Import Urllib2 class Redirecthandler (urllib2. Httpredirecthandler):    def  http_error_301 (self,req,fp,code,msg,headers):        Pass     def  http_error_302 (self,req,fp,code,msg,headers):        pass  = Urllib2.build_opener (redirecthandler) opener.open ('http://www.google.cn ')

5. Cookies

Urllib2 the processing of cookies is also automatic. If you need to get the value of a cookie, the following

Import Urllib2 Import  = = = Opener.open (' http://www.google.cn ') for   in cookie:    Print item.value

6. Put and Delete methods using HTTP

URLLIB2 only supports the Get and post methods of HTTP, and if you want to use HTTP put and delete, you can only use the lower-level httplib library.

Import= urllib2. Request (url,data=data) Request.get_method=Lambda:'PUT'# or ' DELETE 'response = Urllib2.urlopen (Request)

7. Get the return code of HTTP

For 200OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. For other return codes, Urlopen throws an exception. At this point, you should check the code of the exception object.

Import Urllib2 Try :     = Urllib2.urlopen ('http://www.google.cn')except  urllib2. Httperror, E:    Print E.code

8. Debug Log

When using URLLIB2, the debug log can be opened by the following method, so that the contents of the transceiver will be printed on the screen

Import Urllib2httphandler=urllib2. HttpHandler (debuglevel=1= urllib2. Httpshandler (debuglevel=1== Urllib2.urlopen ('http://www.google.cn' )

Python's Urllib2 library detailed usage instructions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More