Let's take a look at the Urlretrieve () function provided by the Urllib module. The Urlretrieve () method downloads remote data directly to the local.
>>> in module urllib:urlretrieve (URL, filename=none, Reporthook=none, Data=none)
- The parameter finename specifies that the local path is saved (Urllib generates a temporary file to hold the data if the parameter is not specified.) )
- The parameter reporthook is a callback function that triggers the callback when the server is connected and the corresponding data block is transferred, and we can use this callback function to display the current download progress.
- The parameter data refers to a post to the server, which returns a two-element (filename, headers) tuple, filename, which represents the local path, and the header represents the server's response header.
Here's an example to illustrate the use of this method, which fetches Google's HTML locally, saves it in a d:/google.html file, and displays the progress of the download.
1 ImportUrllib2 defCBK (A, B, c):3 " "callback function4 @a: Data blocks that have already been downloaded5 @b: Size of data block6 @c: The size of the remote file7 " " 8per = 100.0 * A * b/C9 ifPer > 100: Tenper = 100 One Print '%.2f%%'%per A -URL ='http://www.google.com' -Local ='d://google.html' theUrllib.urlretrieve (URL, local, CBK)
Here is the Urlretrieve () download file instance, which shows the download progress.
1 #!/usr/bin/python2 #Encoding:utf-83 ImportUrllib4 ImportOS5 defSchedule (a,b,c):6 " ""'7 A: Data blocks that have already been downloaded8 B: Size of the data block9 C: The size of the remote fileTen " " Oneper = 100.0 * A * b/C A ifPer > 100 : -per = 100 - Print '%.2f%%'%per theURL ='http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2' - #local = Url.split ('/') [-1] -Local = Os.path.join ('/data/software','python-2.7.5.tar.bz2') - Urllib.urlretrieve (url,local,schedule) + ## # # # #output ###### - #0.00% + #0.07% A #0.13% at #0.20% - #.... - #99.94% - #100.00%
As you can see from the exercises above, Urlopen () can easily get the information from the remote HTML page, then analyze the required data through Python, match the desired data, and download the data locally using Urlretrieve (). The remote URL address with limited access or limited number of connections can be proxies (proxy way) connection, if the remote data volume is too large, single-threaded download is too slow to use multi-threaded download, this is the legendary crawler.
Urllib.urlretrieve Remote Download