1. Urllib2.urlopen (Request)
url = "http://www.baidu.com" #url还可以是其他协议的路径, e.g. ftpvalues = {' name ': ' Michael foord ', ' Location ': ' Northampton ', language ': ' Python '} data = Urllib.urlencode (values) user_agent = ' mozilla/4.0 (compatible; MSIE 5.5; Windows NT) ' headers = {' User-agent ': user_agent} request = Urllib2. Request (URL, data, headers) #也可以这样设置header: Request.add_header (' user-agent ', ' fake-client ') response = Urllib2.urlopen (request) HTML = Response.read ()
This is the case, in fact Urllib2 's Urlopen () method is the most basic way to open a URL, you need to pass in a parameter request, in fact, is a common request object, which can contain url,data (transfer data to the server, For example, common form form data), as well as setting the header parameters (some servers reject bot requests that do not contain headers).
The last fetched page needs to be read using the Read () method of the Response object, otherwise only the memory address of an object can be obtained.
2. Create opener objects to implement cookies and other HTTP functions
2.1 Cookie Processing
The Urlopen () function does not support authentication ,cookies , or other advanced HTTP features . To support these features, you must use the Build_opener () function to create your own custom opener object.
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/72/CC/wKioL1XtiQjw-nqFAAPvAuCjDDk019.jpg "title=" 2013_ 11_04_01.png "alt=" Wkiol1xtiqjw-nqfaapvaucjddk019.jpg "/>
If you want to manage HTTP cookies, you need to create a opener object that adds a httpcookieprocessor handler. By default. Httpcookieprocessor uses the Cookiejar object to provide different types of Cookiejar objects as Httpcookieprocessor parameters that can support different cookie processing.
Mcj=cookielib. Mozillacookiejar ("Cookies.txt") cookiehand=httpcookieprocessor (MCJ) Opener=urllib2.build_opener (Cookiehand) u= Opener.open (http://www.baidu.com)
2.2 Certifications
Password_mgr = Urllib2. Httppasswordmgrwithdefaultrealm () Top_level_url = "http://www.163.com/" Password_mgr.add_password (None, Top_level_ URL, username, password) handler = Urllib2. Httpbasicauthhandler (password_mgr) opener = Urllib2.build_opener (handler) Urllib2.install_opener (opener)
2.3 Agents
        URLLIB2 automatically detects proxy settings and uses the environment variable http_proxy to set HTTP proxies by default.
import urllib2 enable_proxy = trueproxy_handler = urllib2. Proxyhandler ({"http" : ' http://some-proxy.com:8080 '}) null_proxy_handler = urllib2. Proxyhandler ({}) if enable_proxy: opener = urllib2.build_opener ( Proxy_handler) Else: opener = urllib2.build_opener (Null_proxy_handler) Urllib2.install_opener (opener)
One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener. This can be handy for later use, but not finer grained control, like using two different Proxy settings in a program. It is good practice not to use Install_opener to change the global settings, but simply call opener's Open method instead of the global Urlopen method.
2.4 Timeout setting
In the older version, the Urllib2 API did not expose the timeout setting, and to set the timeout value, only the global timeout value of the Socket could be changed.
Importurllib2importsocketsocket.setdefaulttimeout (10) # 10 seconds after timeout Urllib2.socket.setdefaulttimeout (10) # Another way
In the new version of Python 2.6, timeouts can be set directly through the timeout parameter of Urllib2.urlopen ().
Importurllib2response = Urllib2.urlopen (' http://www.google.com ', timeout=10)
2.5 Setting the Header
Basic usage in Urlopen () Basic usage:
Request = Urllib2. Request (URL, data, headers)
You can also set the request object after it has been generated
Importurllib2request =urllib2. Request (URI) Request.add_header (' user-agent ', ' fake-client ') response = Urllib2.urlopen (Request)
For some of the headers to pay special attention to the server side will be for these headers check,
User-agent Some servers or proxies check this value to determine whether a browser-initiated Request
content-type when using the REST interface, the Server checks the value to determine How the content in the HTTP Body should be parsed.
when using RPC to call a RESTful or SOAP service provided by server, the Content-type setting error causes the server to reject the service.
2.6 Redirect Redirect
URLLIB2 automatically Redirect actions for 3xx HTTP return codes by default, without manual configuration. To detect if a Redirect action has occurred, just check that the URL of the Response and the URL of the Request are consistent.
Importurllib2response =urllib2.urlopen (' http://www.google.cn ') whether_redirected = response.geturl () = = ' HTTP// Www.google.cn '
If you do not want to automatically Redirect, you can use the custom Httpredirecthandler class in addition to the lower Httplib library.
Importurllib2class Redirecthandler (urllib2. Httpredirecthandler): Def http_error_301 (self, req, FP, code, MSG, headers): Pass Def http_error_302 (self, R EQ, FP, code, MSG, headers): Pass opener =urllib2.build_opener (Redirecthandler) opener.open (' http://www . [CN]
2.7 PUT and DELETE methods using HTTP
URLLIB2 only supports the GET and POST methods of HTTP, and if you want to use HTTP PUT and DELETE, you can only use the lower-level httplib library. Nonetheless, we can enable URLLIB2 to issue HTTP PUT or DELETE packages in the following way:
IMPORTURLLIB2 request =URLLIB2. Request (URI, data=data) Request.get_method = lambda: ' PUT ' # or ' DELETE ' response = Urllib2.urlopen (Request)
This approach, though belonging to the Hack, is not a problem in practical use.
2.8 Getting an HTTP return code
For a $ OK, the return code for HTTP can be obtained as long as the GetCode () method of the response object returned by Urlopen is used. However, for other return codes, Urlopen throws an exception. At this point, you should check the Code property of the Exception object:
Importurllib2try:response =urllib2.urlopen (' http://restrict.web.com ') except URLLIB2. Httperror, E:print E.code
2.9 Debug Log
When using URLLIB2, you can use the following method to open the debug Log, so that the contents of the transceiver will be printed on the screen, convenient for us to debug, to a certain extent can eliminate the work of grasping the package.
Import Urllib2httphandler =urllib2. HttpHandler (debuglevel=1) Httpshandler =urllib2. Httpshandler (debuglevel=1) opener =urllib2.build_opener (HttpHandler, Httpshandler) Urllib2.install_opener (opener) Response = Urllib2.urlopen (' http://www.google.com ')
This article is from the "Keep_study_zh" blog, make sure to keep this source http://zhkpsty.blog.51cto.com/9013616/1692504
Python's Urllib2 Package basic usage method