Python's Urllib2 Package basic usage method

Source: Internet
Author: User
Tags soap

1. Urllib2.urlopen (Request)
url = "http://www.baidu.com" #url还可以是其他协议的路径, e.g. ftpvalues = {' name ': ' Michael foord ', ' Location ': ' Northampton ', language ': ' Python '} data = Urllib.urlencode (values) user_agent = ' mozilla/4.0 (compatible; MSIE 5.5; Windows NT) ' headers = {' User-agent ': user_agent} request = Urllib2. Request (URL, data, headers) #也可以这样设置header: Request.add_header (' user-agent ', ' fake-client ') response = Urllib2.urlopen (request) HTML = Response.read ()

This is the case, in fact Urllib2 's Urlopen () method is the most basic way to open a URL, you need to pass in a parameter request, in fact, is a common request object, which can contain url,data (transfer data to the server, For example, common form form data), as well as setting the header parameters (some servers reject bot requests that do not contain headers).

The last fetched page needs to be read using the Read () method of the Response object, otherwise only the memory address of an object can be obtained.

2. Create opener objects to implement cookies and other HTTP functions

2.1 Cookie Processing

The Urlopen () function does not support authentication ,cookies , or other advanced HTTP features . To support these features, you must use the Build_opener () function to create your own custom opener object.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/72/CC/wKioL1XtiQjw-nqFAAPvAuCjDDk019.jpg "title=" 2013_ 11_04_01.png "alt=" Wkiol1xtiqjw-nqfaapvaucjddk019.jpg "/>

If you want to manage HTTP cookies, you need to create a opener object that adds a httpcookieprocessor handler. By default. Httpcookieprocessor uses the Cookiejar object to provide different types of Cookiejar objects as Httpcookieprocessor parameters that can support different cookie processing.

Mcj=cookielib. Mozillacookiejar ("Cookies.txt") cookiehand=httpcookieprocessor (MCJ) Opener=urllib2.build_opener (Cookiehand) u= Opener.open (http://www.baidu.com)


2.2 Certifications

Password_mgr = Urllib2. Httppasswordmgrwithdefaultrealm () Top_level_url = "http://www.163.com/" Password_mgr.add_password (None, Top_level_ URL, username, password) handler = Urllib2. Httpbasicauthhandler (password_mgr) opener = Urllib2.build_opener (handler) Urllib2.install_opener (opener)


2.3 Agents

        URLLIB2 automatically detects proxy settings and uses the environment variable http_proxy to set HTTP proxies by default.

import urllib2 enable_proxy = trueproxy_handler  = urllib2. Proxyhandler ({"http"  :  ' http://some-proxy.com:8080 '}) null_proxy_handler = urllib2. Proxyhandler ({})  if enable_proxy:    opener = urllib2.build_opener ( Proxy_handler) Else:    opener = urllib2.build_opener (Null_proxy_handler)   Urllib2.install_opener (opener) 

One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.


2.4 Timeout setting

In the older version, the Urllib2 API did not expose the timeout setting, and to set the timeout value, only the global timeout value of the Socket could be changed.

Importurllib2importsocketsocket.setdefaulttimeout (10) # 10 seconds after timeout Urllib2.socket.setdefaulttimeout (10) # Another way

In the new version of Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().

Importurllib2response = Urllib2.urlopen (' http://www.google.com ', timeout=10)


2.5 Setting the Header

Basic usage in Urlopen () Basic usage:

Request = Urllib2. Request (URL, data, headers)

You can also set the request object after it has been generated

Importurllib2request =urllib2. Request (URI) Request.add_header (' user-agent ', ' fake-client ') response = Urllib2.urlopen (Request)

For some of the headers to pay special attention to the server side will be for these headers check,

  • User-agent Some servers or proxies check this value to determine whether a browser-initiated Request

  • content-type when using the REST interface, the Server checks the value to determine How the content in the HTTP Body should be parsed.

        • application/xml: Use

        • application/json: Use

        • application/x-www-form-urlencoded: Browser Submissions Web forms use

when using RPC to call a RESTful or SOAP service provided by server, the Content-type setting error causes the server to reject the service.


2.6 Redirect Redirect

URLLIB2 automatically Redirect actions for 3xx HTTP return codes by default, without manual configuration. To detect if a Redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.

Importurllib2response =urllib2.urlopen (' http://www.google.cn ') whether_redirected = response.geturl () = = ' HTTP// Www.google.cn '

If you do not want to automatically Redirect, you can use the custom Httpredirecthandler class in addition to the lower Httplib library.

Importurllib2class Redirecthandler (urllib2. Httpredirecthandler): Def http_error_301 (self, req, FP, code, MSG, headers): Pass Def http_error_302 (self, R EQ, FP, code, MSG, headers): Pass opener =urllib2.build_opener (Redirecthandler) opener.open (' http://www . [CN]


2.7 PUT and DELETE methods using HTTP

URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to issue HTTP PUT or DELETE packages in the following way:
IMPORTURLLIB2 request =URLLIB2. Request (URI, data=data) Request.get_method = lambda: ' PUT ' # or ' DELETE ' response = Urllib2.urlopen (Request)

This approach, though belonging to the Hack, is not a problem in practical use.


2.8 Getting an HTTP return code

For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:

Importurllib2try:response =urllib2.urlopen (' http://restrict.web.com ') except URLLIB2. Httperror, E:print E.code


2.9 Debug Log

When using URLLIB2, you can use the following method to open the debug Log, so that the contents of the transceiver will be printed on the screen, convenient for us to debug, to a certain extent can eliminate the work of grasping the package.

Import Urllib2httphandler =urllib2. HttpHandler (debuglevel=1) Httpshandler =urllib2. Httpshandler (debuglevel=1) opener =urllib2.build_opener (HttpHandler, Httpshandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.google.com ')


This article is from the "Keep_study_zh" blog, make sure to keep this source http://zhkpsty.blog.51cto.com/9013616/1692504

Python's Urllib2 Package basic usage method

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.