Python urllib module details and examples

Last Update:2013-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python urllib module details and examples
Let's take a look at an example. This example captures the html of the Google homepage and displays it on the console:
Import urllib
Print urllib. urlopen ('HTTP: // www.google.com '). read ()
# Don't be surprised. The whole program actually only uses two lines of code.
Import urllib
Print urllib. urlopen ('HTTP: // www.google.com '). read ()
Urllib. urlopen (url [, data [, proxies]):
Create a class file object that represents a remote url, and then operate on this class file object like a local file to obtain remote data. The parameter url indicates the path of the remote data, which is generally the url. The parameter data table shows the data submitted to the url in post mode (the web users should know the two methods of data submission: post and get. If you don't know, you don't have to worry too much about it. This parameter is rarely used in general). The parameter proxies is used to set the proxy (here we will not detail how to use the proxy, interested readers can refer to the Python manual urllib module ). Urlopen returns a class object, which provides the following methods:

Read (), readline (), readlines (), fileno (), close (): these methods are used in the same way as file objects;
Info (): returns an httplib. HTTPMessage object, indicating the header information returned by the remote server;
Getcode (): return the Http status code. For an http request, 200 indicates that the request is successfully completed; 404 indicates that the URL is not found;
Geturl (): return the request url;
Below we will expand the above example. You can run this example to deepen your impression on urllib:

Google = urllib. urlopen ('HTTP: // www.google.com ')
Print 'HTTP header: \ n', google.info ()
Print 'HTTP status: ', google. getcode ()
Print 'url: ', google. geturl ()
For line in google: # Just like operating a local file
Print line,
Google. close ()
Google = urllib. urlopen ('HTTP: // www.google.com ')
Print 'HTTP header: \ n', google.info ()
Print 'HTTP status: ', google. getcode ()
Print 'url: ', google. geturl ()
For line in google: # Just like operating a local file
Print line,
Google. close ()
Urllib. urlretrieve (url [, filename [, reporthook [, data]):
The urlretrieve method directly downloads remote data to the local device. The filename parameter specifies the path to be saved locally (if this parameter is not specified, urllib generates a temporary file to store data). The reporthook parameter is a callback function, this callback is triggered when the server is connected and the corresponding data block is transferred. We can use this callback function to display the current download progress. The following example shows the progress. The parameter data refers to the data that is post to the server. This method returns a tuple (filename, headers) containing two elements. filename indicates the path saved to the local directory, and header indicates the response header of the server. The following example demonstrates how to use this method. In this example, the html of the Sina homepage is crawled locally and saved in the D: \ sina.html file, and the download progress is displayed.

Def cbk (a, B, c ):
''''' Callback function
@ A: downloaded data blocks
@ B: data block size
@ C: Remote File Size
'''
Per = 100.0 * a * B/c
If per> 100:
Per = 100.
Print '%. 2f %' % per

Url = 'HTTP: // www.sina.com.cn'
Local = 'd: \ sina.html'
Urllib. urlretrieve (url, local, cbk)
Def cbk (a, B, c ):
''' Callback function
@ A: downloaded data blocks
@ B: data block size
@ C: Remote File Size
'''
Per = 100.0 * a * B/c
If per> 100:
Per = 100.
Print '%. 2f %' % per

Url = 'HTTP: // www.sina.com.cn'
Local = 'd: \ sina.html'
Urllib. urlretrieve (url, local, cbk)
The two methods described above are the most commonly used methods in urllib. These methods use the URLopener or FancyURLOpener class internally when obtaining remote data. As a user of urllib, we seldom use these two classes. I don't want to talk about them more here. If you are interested in urllib implementation or want urllib to support more protocols, you can study these two classes. In the Python manual, the author of urllib also lists the defects and shortcomings of this module. If you are interested, you can open the Python manual to learn more.

Urllib also provides some auxiliary methods for url encoding and decoding. There are no special symbols in the url, and some symbols have special purposes. We know that when we submit data in get mode, a string such as key = value will be added to the url, so '=' is not allowed in value ', therefore, it must be encoded. When the server receives these parameters, it must be decoded and restored to the original data. At this time, these auxiliary methods will be very useful: www.2cto.com

Urllib. quote (string [, safe]): encode the string. The safe parameter specifies characters that do not require encoding;
Urllib. unquote (string): decodes a string;
Urllib. quote_plus (string [, safe]): similar to urllib. quote, but this method uses '+' to replace '', while the quote uses'' to replace''
Urllib. unquote_plus (string): decodes a string;
Urllib. urlencode (query [, doseq]): converts a list of dict or tuples containing two elements into url parameters. For example, if the dictionary {'name': 'Dark-bull ', 'age': 200} will be converted to "name = dark-bull & age = 200"
Urllib. pathname2url (path): converts a local path to a url path;

Author: lmh12506

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python urllib module details and examples

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python urllib module details and examples

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support