Python's urllib and URLLIB2 modules

Last Update:2017-11-07 Source: Internet

Author: User

Tags session id urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python's urllib and urllib2 modules do all the work associated with requesting URLs, but they provide different functionality. The two most significant differences between them are as follows:

Urllib2 can accept a request object and use this to set the headers of a URL, but urllib only receives one URL. This means that you cannot disguise your user-agent string and so on.
The Urllib module can provide a method for UrlEncode, which is used to generate a get query string, and URLLIB2 does not have such a function. This is why Urllib and urllib2 are often used together.

Common methods

Urllib2.urlopen (url[, data][, timeout])
The Urlopen method is the most commonly used and simplest method of the Urllib2 module, which opens URL URLs where the URL parameter can be a string URL or a request object. The URL is nothing to say, the request object and data are described in the request class, and the definitions are the same.

For optional parameter timeout, the blocking operation is in seconds

import urllib2response = urllib2.urlopen(‘http://python.org/‘)html = response.read()

The Urlopen method can also explicitly indicate the URL you want to get by establishing a request object. Call the Urlopen function to return a response object to the requested URL. This response is similar to a file object, so you can manipulate the response object with the. Read () function, and we'll say more about the use of the return value of the Urlopen function.

import urllib2req = urllib2.Request(‘http://python.org/‘)response = urllib2.urlopen(req,data=‘abc‘)the_page = response.read()

Urllib2. Request (url[, data][, headers][, origin_req_host][, unverifiable])

The request class is an abstraction for URL requests.
The URLLIB2 is used here. Request class, for the above example, we only instantiate the request class object with a URL, in fact the request class has other parameters.
Data is requested as a parameter, if data is not equal to none, the request is post, otherwise the get

import urllibimport urllib2url = ‘http://www.baidu.com‘values = {‘name‘ : ‘qiangzi‘,          ‘age‘ : 27,          ‘id‘ :1}data = urllib.urlencode(values)req = urllib2.Request(url, data)response = urllib2.urlopen(req)the_page = response.read()

headers--is a dictionary type, the header dictionary can be passed directly as a parameter at request, or you can add each key and value as a parameter by calling the Add_header () method. The user-agent header, which identifies the browser, is often used for spoof and spoofing, because some HTTP services allow only certain requests to come from common browsers rather than scripts, or to return different versions for different browsers. For example, Mozilla Firefox browser is recognized as "mozilla/5.0 (X11; U Linux i686) gecko/20071127 firefox/2.0.0.11 ". By default, URLIB2 identifies itself as python-urllib/x.y (where XY is the major or minor version number of the Python release, as in Python 2.6, the Default user-agent string for URLLIB2 is "python-urllib/ 2.6. The following example differs from the above by adding a headers to the request and emulating the IE browser submission request.

import urllibimport urllib2url = ‘http://www.baidu.com‘user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘values = {‘name‘ : ‘qiangzi‘,          ‘age‘ : 27,          ‘id‘ :1}headers = { ‘User-Agent‘ : user_agent }data = urllib.urlencode(values)req = urllib2.Request(url, data, headers)response = urllib2.urlopen(req)the_page = response.read()

Adding a header can also take another scenario
req.add_header(‘Referer‘, ‘http://www.python.org/‘)
3) Other common methods
Geturl ()-Returns the retrieved URL resource, which is the true URL that is returned, usually used to authenticate the redirect, such as the following code 4 line URL if the equals "http://www.python.org/" description is not redirected. If it is redirected, it is possible that the redirected value is needed, such as when downloading if we want to get the final
GetCode ()-Returns the HTTP status code of the response, run the following code to get code=200, the specific code represents the meaning see the appendix.

4) Httpcookieprocessor
Many sites have resources that require users to log in before they can get them.
Once we log in and then access other protected resources, we no longer need to enter the account number and password again. So how did the website do it?
In general, after the user logs on, the server will create a session for the user. The session corresponds to the user's profile. The file represents the user.
What about a visit request that belongs to that user? When logged in, the server requires the browser to store a cookie value for the session ID. Each visit is brought with the cookie. The server will know which user the request came from by a match between the session ID in the cookie and the session ID in the server.

Opener
When we call Urllib2.urlopen (URL), in fact URLLIB2 creates a default opener object inside the Open function. Then call the Opener.open () function.
However, the default opener does not support cookies.
So let's start with a new opener that supports cookies. The URLLIB2 is for us to use Httpcookieprocessor.

Creating a httpcookieprocessor requires a container for storing cookies.
Python provides a container for storing cookies located in Cookielib, with the following several.
Cookiejar, Filecookiejar, Mozillacookiejar/lwpcookiejar

import cookielibimport urllib2cookies = cookielib.CookieJar()cookieHandler = urllib2.HTTPCookieProcessor(cookiejar=cookies)opener = urllib2.build_opener(cookieHandler)request = urllib2.Request("http://www.baidu.com")urllib2.urlopen(request)for cookie in cookies:    print cookie.name, cookie.value

The code above shows that URLLIB2 does help us extract the cookie from the response. But how do you save it in a file?

urllib2.install_opener(opener)
The global opener of the URLLIB2 is set

Finally, I'll explain the JSON package.
Json.dumps to encode a Python object as a JSON string
Json.loads decoding encoded JSON strings to Python objects

I'm a strong son.
Links: http://www.jianshu.com/p/1416ccc99979
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

Python's urllib and urllib2 modules

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More