Linux wget implementation breakpoint download entire file directory

Source: Internet
Author: User
Tags parent directory

wget Introduction

Wget can track links on HTML pages to download to create a local version of the remote server, completely rebuilding the original site's directory structure. This is often referred to as a "recursive download." In the recursive download, wget follows the robot Exclusion Standard (/robots.txt). Wget can be downloaded at the same time, the link to the local file to point to facilitate offline browsing.

The wget is very stable and has a strong adaptability in the case of very narrow bandwidth and unstable networks. If the download fails for the reason of the network, wget will keep trying until the entire file is downloaded. If the server interrupts the download process, it will again be linked to the server and continue downloading from where it stopped. This is useful for downloading large files from servers that have limited link time


Simple Example


Wget-t 0-t 120-np-c-R ftp://xxx:xxx@xxx.com/xxx
Where "NP" means not to follow the link, only download the specified directory and subdirectories of things;
-C indicates a continuation of the start breakpoint, in fact, the next directory and subdirectories have-R can be done,
An easy to use image with the-m parameter, namely:
Wget-m-NP ftp://xxx:xxx@xxx.com/xxx

Instance

Download 192.168.1.168 first page and display download information Linux wget-d http://192.168.1.168 download 192.168.1.168 First and do not display any information wget-q http:// 192.168.1.168 download all files contained in the filelist.txt link wget-i filelist.txt

Download to the specified directory wget-p/tmp ftp://user:passwd@url/file to download file files to the/tmp directory. Linux wget is a command line download tool. For our Linux users, it is used almost every day. Here are a few useful Linux wget tips that will allow you to use Linux wget more efficiently and flexibly.

* $ wget-r-np-nd http://example.com/packages/This command can download all files in the packages directory on the http://example.com Web site. Where the-NP is not traversing the parent directory,-nd indicates that the directory structure is not recreated on the native computer.

* $ wget-r-np-nd--accept=iso http://example.com/centos-5/i386/is similar to the previous command, but adds a--accept=iso option, which indicates that Linux wget downloads only the i386 directory All files that have the ISO file name extension in. You can also specify multiple extension names, separated by commas.

* $ wget-i filename.txt This command is often used in bulk downloads, put all the addresses that need to be downloaded into filename.txt, and then Linux wget automatically downloads all files for you.

* $ wget-c Http://example.com/really-big-file.iso The function of the-c option specified here is a breakpoint continuation.

* $ wget-m-K (-h) http://www.example.com/This command can be used to mirror a Web site, and Linux wget will convert the link. If the image in the site is on a different site, you can use the-H option.


below we continue to understand the heart of wget, to see what kind of thoughtful function, we usually do not contact.


1-T Options
The –tries=number, which is used to set the number of retries for the wget download, when set to 0 (number 0) or INF to indicate an unlimited retry. The default number of retries is 20 times.
But wget is not under any circumstances will be silly to retry, for example, in the event of "connection refused" or "not found", wget will exit immediately, will not be retried.
2-o Options
That is, the contents of the output to the standard output during the –output-file=logfile,wget run are written to the set logfile file.
3-o Options
That is, –output-document-file, which means that the contents of all files downloaded by wget are appended to the file files that are set up, without creating the original file. Using the-o option when downloading a separate file avoids the problem of writing to the ". 1" suffix file by default when wget downloads a file with the same name.
4-n Options
The –timestamping, which means that the timestamp mechanism is turned on, wget downloads the remote timestamp update file.
5-NC Options
That is, the –no-clobber option.
In the same directory, if a file is downloaded multiple times, the wget processing depends on several important options, including the-NC option.
It is possible that this file will be overwritten, rewritten, or protected when downloading the same file multiple times.
When you use wget to download the same file multiple times without using-N,-NC or-R, Wget automatically adds a ". 1" suffix to the file name on the second download, plus the ". 2" suffix on the third download, and so on.
However, when we use the-NC option, wget does not use the ". 1/.2" policy, but instead refuses to download the same file (even if the contents of the file are updated). This feature is used for Web pages that are pointed to many times at the same time, so using-NC can avoid multiple downloads.
When you use Wget and the-r option, but do not use the-n option or the-NC option, when you download the file with the same name, and when the remote file is updated, wget chooses to overwrite the old file already in the current directory, using-NC to prevent wget from doing so. (But when the remote file is not modified for a new time, wget will refuse to download it.) )
When you use wget and use the-n option, downloading files with the same name depends entirely on the time stamp of the remote and local files and the file size. The-NC option is not allowed with the-N option set at the same time. If you use the-N and the-NC option at the same time, you get the error of "can" T timestamp and not clobber old files at the same.
6-c Options
That is, the –continue option, this is the famous "break-through". No matter which download tool you used to download half of the files, you can use wget to continue downloading the file. Like what:
Wget-c Ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
The current directory already has a ls-lr.z file, wget will assume that it is a half downloaded file, then extract the file size of the local file, and request it to continue downloading from the corresponding file size of the remote file based on this value.
You'll find that in fact wget this breakpoint continuation strategy is hidden, because if the beginning of the remote file is modified, wget is not aware of this when the breakpoint is resumed, it will only be silly to continue downloading from the part after the file size has been transmitted. So after using the-C option breakpoint, be sure to perform a MD5 check.
7–limit-rate=amount Options
This option applies to speed limits, which are limited to amount bytes/second and, of course, can be expressed in unit k/m, for example, –limit-rate=20k will limit speed to 20kb/s.
Note that the principle of wget implementation is to sleep a specific time period after a network read action so that the average network read speed is reduced to a limit, and this strategy will eventually reduce the TCP transmission speed to around the limit value. Therefore, in the transmission of super small files, may not be able to achieve the role of speed limit.
8-w Options
The –wait=seconds option, which sets the number of seconds between wget per two requests. This option is useful to reduce the load on the remote server. You can set the number of seconds directly, but also add m to the minutes, h for hours, d for days.
9–waitretry=seconds Options
Used to set the request retry number of seconds. Wget uses a linear incremental wait method, if you set the 10 seconds, then the first request failed, will wait 1 seconds, the second request failed will wait 2 seconds, until finally reached 10 seconds wait time. So when you get to the last time, it's past 1+2+...+10=55 seconds.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.