The following continues to learn the Python Web module--- urllib2, a module that originates from urllib and is higher than urllib.
1 Urllib2 Introduction
URLLIB2 is a library of Python-brought access to Web pages and local files.
with the Urllib In contrast, the notable difference is:
1) urllib2 can accept an instance of a request class to set the headersof the URL request,urllib Only URLs can be accepted. This means that it is not possible to disguise the user agent string when using Urllib.
2) Urllib provides the UrlEncode method to encode the datasent, and Urllib2 does not. This is why Urllib often uses it with URLLIB2.
2 Urllib2 Common methods
2.1 Urllib2.urlopen
Urlopen () is the simplest way to request, open a URL and return a class file object, and use that object to read the returned content
Urllib2.urlopen (url[, data][, timeout]) parameter: URL: Can be a string that contains a URL, or it can be an instance of a Urllib2.request class. Data: is a coded post (typically encoded using Urllib.urlencode ()). A GET request when no data parameter is set, the data parameter is a POST request timeout: is an optional timeout period (in seconds), sets the time-out for request blocking, and, if not set, the global default timeout parameter, which is used only for HTTP, HTTPS, FTP in effect
Suppose Urlopen () returns the File Object U, which supports the following common methods:
U.read ([nbytes]) reads nbytes data as a byte string
U.readline () reads a single line of text as a byte string
U.readlines () reads all input rows and returns a list
U.close () Close link
U.getcode () returns an integer HTTP response code, such as a successful return of 200, and 404 when no file is found
U.geturl () returns the actual URL of the returned data, but takes into account the redirection issue
U.info () Returns the mapping object with the information associated with the URL, and the server response that is returned contains the HTTP header for HTTP. For FTP, the returned header contains ' Content-length '. For local files, the returned header contains the ' Content-length ' and ' Content-type ' fields.
Attention:
The class file object U operates in binary mode. If you need to process the response data as text, you need to decode the data using the codecs module or a similar method.
Attached code:
>>> Import urllib2>>> res=urllib2.urlopen (' http://www.51cto.com ') >>>res.read () ... (a bunch of source code) >>>res.readline () ' <! DOCTYPE HTML PUBLIC "-//w3c//dtdxhtml 1.0 transitional//en" "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">\r\n ' >>>res.readlines () ... (List form of a bunch of source code) >>>res.info () 2.2 Urllib2.request
Create a new request instance
request (url [data,headers[,origin_ Req_host ,[unverifiable]]): for a relatively simple request, the parameter URL of Urlopen () is a proxy URL, but if more complex operations are required, If you modify the HTTP header, you can create a request instance and: url: it as a URL parameter parameter as a URI string, data: is the data that accompanies the URL submission (such as the data to post). Note, however, that when you provide the data parameter, it changes the HTTP request from ' GET ' to ' POST '. headers: is a dictionary that contains the key-value mappings that represent the HTTP header (that is, what is included in the header to be submitted). origin_req_host: is typically the name of the host that makes the request, if the request is a URL that cannot be verified (usually a URL that is not directly entered by the user, such as a URL that is embedded in the page that loads the image). The next parameter, unverifiable, is set to True
Assumptions Request Example R , the following are some of the more important methods:
R.add_data adds data to the request. If the request is an HTTP request, the method changes to ' POST '. Data is submitted to the specified URL, and it is important to note that the method does not track data to any of the previous settings, but replaces the previous one with the current one.
R.add_header (Key, Val) adds header information to the request, key is the header name, Val is the header value, and two parameters are strings.
The R.addunredirectedheader (key,val) function is the same as above, but is not added to the redirect request.
R.set_proxy (host, type) prepares the request to the server. Replace the original host with host and replace the original request type with type.
Attached code:
1 submitting data to a Web page:
>>> Import urllib>>> Import urllib2>>> url= ' http://www.51cto.com ' >>> info={' name ': ' 51cto ', ' Location ': ' 51cto '} #info需要被编码为urllib2能理解的格式, here is urllib>>> Data=urllib.urlencode (info) > >> data ' Name=51cto&location=51cto ' >>> request=urllib2. Request (Url,data) >>> Response=urllib2.urlopen (Request) >>> The_page=response.read ()
2 Modifying page header information:
Sometimes, the program is also right, but the server denies your access. What is this for? The problem is the header information in the request (header). Some services have a neat, not like the program to touch it. At this point you need to disguise your program as a browser to make a request. The method of the request is included in the header.
When using the REST interface, the Server checks the Content-type field to determine how the content in the HTTP Body should be parsed.
>>> Import urllib>>> Import urllib2>>> url= ' http://www.51cto.com ' # will user_agent write header information > >> user_agent= ' mozilla/4.0 (compatible; MSIE 5.5; windowsnt) ' >>>values={' name ': ' 51cto ', ' Location ': ' 51cto ', ' language ': ' Python '} >>> headers={' User-agent ':user_agent}>>> data=urllib.urlencode (values) >>> req=urllib2. Request (url,data,headers) >>> Response=urllib2.urlopen (req) >>> The_page=response.read ()
2.3 Exception Handling
Cannot handle a respons when Urlopen throws a Urlerror
Urllib2. Urlerror:urllib2. Httperror:
Httperror is a subclass of the urlerror that the HTTP URL is thrown in under special circumstances.
Urlerror:
Usually,Urlerroris thrown because there is no network connection (no connection to a specific server) or a specific server does not exist. In this case, the inclusion ofreasonThe exception to the property is thrown, in a way that contains the error code and text error messages.tupleform.
#!/usr/bin/env python#-*-coding:utf-8-*-import urllib2# wrote one more m (comm) req = urllib2. Request (' Http://www.51cto.comm ') try:urllib2.urlopen (req) except URLLIB2. Urlerror,e:print e Print E.reason
Results:
<urlopen error [Errno 11004] getaddrinfo failed>[errno 11004] getaddrinfo failed
The above is a simple usage of URLLIB2, if you want to drill down:
Http://zhuoqiang.me/python-urllib2-usage.html
The difference between Urllib and URLLIB2:
Http://www.cnblogs.com/yuxc/archive/2011/08/01/2124073.html
This article is from "a struggling small operation" blog, please be sure to keep this source http://yucanghai.blog.51cto.com/5260262/1697135
Python Web Module Learning--URLLIB2