Use wget to download files

Source: Internet
Author: User
Tags ftp site ftp client ftp protocol
Today on the Internet to check some of the wget of this powerful network tool usage, now listed as follows:

A collection of wget use tips
Wget are used in the form of:
wget [argument list] URL
First, let's introduce the main parameters of wget:
· -B: Let Wget run in the background, record files written in the current directory "Wget-log" file;
· -T [Nuber of times]: number of attempts to try to connect when wget cannot establish a connection to the server. For example, "-t120" means to try 120 times. When this item is "0", it is useful to specify that you want to try infinity many times until the connection is successful, and you can continue downloading the files that have not been passed when the other server suddenly shuts down or the network suddenly breaks;
· -C: Breakpoint continuation, which is also a very useful setting, in particular, when loading a larger file, if interrupted accidentally, then the connection recovery will be from the last time did not pass through the place, instead of starting from scratch, use this item requires remote server also support breakpoint continuation, generally speaking, based on unix/ Linux's WEB/FTP server supports breakpoint continuation;
· -T [number of seconds]: timeout, specify how long the remote server will disconnect without responding, and start the next attempt. For example, "-t120" means that if the remote server does not send data over 120 seconds later, it will retry the connection. If the network speed is relatively fast, this time can be set shorter, on the contrary, can be set longer, generally not more than 900, usually not less than 60, generally set in 120 or so more appropriate;
· -W [Number of seconds]: how many seconds to wait between two attempts, such as "-W 100" to indicate two attempts to wait 100 seconds;
· -y On/off: Connecting via/without a proxy server;
· -Q [byetes]: Limit the total size of the download file can not exceed how much, such as "-q2k" means not more than 2K bytes, "-q3m" means that the maximum can not exceed 3M bytes, if nothing after the number is not added, it is in bytes, such as "-q200" Represents a maximum of 200 bytes;
· -nd: Do not download the directory structure, the download from the server all the specified directory files are heap into the current directory;
· -X: In contrast to the "-nd" setting, creating a complete directory structure, such as "Wget-nd http://www.gnu.org", creates a "www.gnu.org" subdirectory in the current directory and then builds at the level of the server's actual directory structure. Until all the documents have been handed out;
· -NH: Do not create a directory with the destination host domain name directory, the target host directory structure directly down to the current directory;
· --http-user=username
· --http-passwd=password: If the Web server needs to specify username and password, use these two to set;
· --proxy-user=username
· --proxy-passwd=password: If the proxy server needs to enter a username and password, use both options;
· -R: Set up server-side directory structure on this machine;
· -l [Depth]: Download the depth of the remote server directory structure, such as "L 5" Download directory structure or file with a depth less than or equal to 5;
· -M: When doing site mirroring options, if you want to do a site mirroring, use this option, it will automatically set other appropriate options to facilitate site mirroring;
· -NP: Downloads only the contents of the specified directory and its subdirectories for the target site. This is also a very useful option, we assume that someone's personal home page has a connection to the other person's personal homepage, and we just want to download the person's personal homepage, if not set this option, or even--it is possible to capture the entire site, which is clearly what we usually do not want;

II How to set the proxy server used by wget
Wget can use the user settings file ". wgetrc" to read many settings, we mainly use this file to set up a proxy server. Users log in with what user, then what is the ". Wgetrc" file in the household-headed directory works. For example, if the "root" user wants to use ". Wgetrc" to set up a proxy server, "/root/.wgert" works, the following gives the contents of a ". Wgetrc" file that can be used to write your own "wgetrc" file:
Http-proxy = 111.111.111.111:8080
Ftp-proxy = 111.111.111.111:8080
The meaning of these two lines is that the proxy server IP address is: 111.111.111.111, and the port number is: 80. The first line specifies the proxy server used by the HTTP protocol, and the second specifies the proxy server used by the FTP protocol.

How do I use the school's socks agent?
/USR/LOCAL/ETC/WGETRC or ~/.WGETRC:

Http_proxy = 202.119.24.35:8080
Ftp_proxy = 202.119.24.35:8080
Proxy-user = user
PROXY-PASSWD = password
Use_proxy = On
Parameters:
Code:

$ wget--help

GNU Wget 1.9.1, non-interactive network file Download tool.
Usage: wget [options] ... [URL] ...

Parameters that must be used with long options are also required when using short options.

Start:

-V,--version displays the version of Wget and exits.
-H,--help print this help.
-B,-background after boot into the background operation.
-E,-execute=command run '. Wgetrc ' form of command.

Log records and input files:

-O, the--output-file= file writes the log message to the specified file.
-A, the--append-output= file appends the log message to the end of the specified file.
-D,--debug print debug output.
-Q,--quiet quiet mode (no output information).
-V,--verbose verbose output mode (default).
-NV,--non-verbose closes verbose output mode, but does not enter quiet mode.
-I,--input-file= file to download the URL found from the specified file.
-F,--force-html processing input files in HTML.
-B,--base=url adds the specified URL before the relative link when using the-f-i file option.

Download:

-T,--tries= number of times to configure retries (0 for Infinity).
--retry-connrefused retries Even if a connection is denied.
The-o--output-document= file writes data to this file.
-NC,--no-clobber does not change existing files and does not use the file name
Add A. # (# to number) method to write a new file.
-C,--continue continues to receive a portion of the files that have been downloaded.
--progress= way to choose how you want the download progress to be represented.
-N,--timestamping will not be retrieved unless the remote file is newer.
-S,--server-response displays the server response message.
--spider does not contain any data.
-T,--timeout= number of seconds to configure the timeout (in seconds) to read data.
-W,--wait= the number of seconds to wait between different files.
--waitretry= seconds between each retry (from 1 seconds to the specified number of seconds).
--random-wait receive different files for a period of time (ranging from 0 seconds to 2*wait seconds).
-Y,--proxy=on/off to turn on or off the proxy server.
-Q, the--quota= size configuration receives the quota size for the data.
The--bind-address= address is connected using the specified address (host name or IP) of the local computer.
--limit-rate= rate limits the rate at which downloads are downloaded.
--dns-cache=off disables the lookup of DNS that is stored in the cache.
--restrict-file-names=os limit the characters in the file name to the characters allowed by the specified OS (operating system).

Directory:

-nd--no-directories does not create a directory.
-X,--force-directories forces a directory to be created.
-NH,--no-host-directories does not create a directory containing the remote host name.
-p to create a directory with the specified name before saving the file with the--directory-prefix= name.
The number of--cut-dirs= ignores the specified number of directory tiers in the remote directory.

HTTP options:

--http-user= user to configure HTTP user name.
--http-passwd= Password to configure the HTTP user password.
-C,--cache=on/off (not) using the data from the cache in the server (used by default).
-E,--html-extension adds the. html extension file name to all files that have a MIME type of text/html.
--ignore-length ignores the "content-length" File header field.
The--header= string adds the specified string to the header of the file.
--proxy-user= user to configure Proxy server user name.
--proxy-passwd= Password to configure Proxy server user password.
--referer=url contains the "Referer:url" header in the HTTP request.
-S,--save-headers the HTTP header to the file.
-U,--user-agent=agent flag is agent rather than wget/version.
--no-http-keep-alive disables HTTP keep-alive (persistent connections).
--cookies=off disables cookies.
--load-cookies= a cookie is loaded by the specified file before the file session begins.
Save the cookie to the specified file after the--save-cookies= file session ends.
The--post-data= string uses the Post method to send the specified string.
The--post-file= file uses the Post method to send the contents of the specified file.

HTTPS (SSL) Option:

--sslcertfile= file optional client segment certificate.
The--sslcertkey= key file is an optional "key file" for this certificate.
--egd-file= file EGD socket file name.
The directory where the--sslcadir= directory CA hash table resides.
The--sslcafile= file contains the files for the CA.
--SSLCERTTYPE=0/1 client-cert type 0=PEM (default)/1=ASN1 (DER)
--SSLCHECKCERT=0/1 Check the server's certificate based on the provided CA
--sslprotocol=0-3 Select SSL Protocol; 0= automatic selection,
1=sslv2 2=sslv3 3=TLSV1

FTP options:

-NR,--dont-remove-listing does not delete the ". Listing" file.
-G,--glob=on/off sets whether to expand the file name with wildcard characters.
--PASSIVE-FTP uses the passive transfer mode.
--retr-symlinks in recursive mode, downloads the file indicated by the link (with the exception of the directory).

Recursive Downloads:

-R,--recursive recursive download.
-L,--level= number maximum recursive depth (INF or 0 indicates infinity).
--delete-after deletes the downloaded file.
-K,--convert-links converts an absolute link to a relative link.
-K,--backup-converted back to X.orig before converting the file X.
-M,--mirror is equivalent to the-r-n-l INF-NR option.
-P,--page-requisites downloads all the files needed to display the full Web page, such as images.
--strict-comments opens the Strict (SGML) processing options for HTML comments.

Options for accept/reject when recursive downloads:

-A, a comma-delimited list of file styles accepted by the--accept= list.
-R, List of file styles excluded by the--reject= list, separated by commas.
-D, the list of domains accepted by the--domains= list, separated by commas.
List of domains excluded by the--exclude-domains= list, separated by commas.
--FOLLOW-FTP follows the FTP link in the HTML file.
The HTML tags to follow for the--follow-tags= list, separated by commas.
-G, the HTML tags to be ignored by the--ignore-tags= list, separated by commas.
-H,--span-hosts can go to other hosts when recursion is recursive.
-L,--relative only follows relative links.
-I, List of directories to download for the--include-directories= list.
-X, List of directories to exclude from the--exclude-directories= list.
-NP,--no-parent does not search the upper directory.



Example: A convenient network download tool wget
Source: http://www-900.ibm.com/cn/support/viewdoc/detail?DocId=2311073I23002

Network users sometimes encounter the need to download a batch of files, and sometimes even need to download the entire site or create a mirror image of the site. Users under windows are familiar with Teleport,webzip and so on Web site download tools, in fact AIX can do this function, that is, the use of wget tools. Wget is a command-line tool for downloading network files or the entire Web site, which has a powerful function of automatic retry, breakpoint continuation, support proxy server, etc. It can completely replace the FTP client. wget is open source software developed under Linux, and the author is Hrvoje Niksic, which was later ported to various platforms including Windows.

Although the wget is powerful, but it is relatively simple to use, the basic syntax is: wget [parameter list] URL. The following is a combination of specific examples to illustrate the use of wget.

1, download the entire HTTP or FTP site.

wget Http://place.your.url/here

This command can be downloaded Http://place.your.url/here home page. Using-X forces the creation of an identical directory on the server, and if you use the-nd parameter, all content downloaded from the server will be added to the local current directory.

Wget-r Http://place.your.url/here

This command will follow the recursive method of downloading all the directories and files on the server, essentially downloading the entire site. This command must be used carefully, because at the time of downloading, all the addresses pointed to by the downloaded Web site will also be downloaded, so if the site references other sites, the referenced sites will also be downloaded. For this reason, this parameter is not commonly used. You can use the-l number parameter to specify the level of the download. For example, download only two tiers, then use-l 2.

If you want to make a mirrored site, you can use the-m parameter, for example:

Wget-m Http://place.your.url/here

At this point wget will automatically determine the appropriate parameters to make the mirror site. At this point, wget will log on to the server, read the robots.txt and follow Robots.txt's rules.

2, the breakpoint continues to pass.

When the file is particularly large or the network is particularly slow, often a file has not been downloaded, the connection has been cut off, this time need to continue to pass the breakpoint. Wget's breakpoint continuation is automatic and requires only the-c parameter, for example:

Wget-c Http://the.url.of/incomplete/file

Using a breakpoint continuation requires the server to support breakpoint continuation. The-t parameter indicates the number of retries, such as the need to retry 100 times, then write-T-100, if set to-T 0, then an infinite retry until the connection succeeds. The-t parameter indicates a timeout wait time, such as-t 120, which means that waiting for a 120-second connection is not even timed out.

3, Bulk download.

If you have more than one file to download, you can generate a file that writes a single line of each file's URL, such as generating a file download.txt, and then using the command:

Wget-i Download.txt

This will download every URL listed in the Download.txt. (If the column is a file to download the file, if the column is a website, then download the home page)

4, optional download.

You can specify that you want wget to download only a class of files, or not to download any files. For example:

Wget-m--reject=gif Http://target.web.site/subdirectory

Indicates download http://target.web.site/subdirectory, but ignores GIF files. --accept=list acceptable file types,--reject=list file types that are rejected.

5, Password and authentication.

Wget can only handle Web sites that restrict access by user name/password, and two parameters are available:

--http-user=user Set HTTP User
--http-passwd=pass Set HTTP Password

For Web sites that require certification, you can use only other download tools, such as curl.

6, the use of Proxy server to download.

If a user's network needs to go through a proxy server, you can have wget download the file through a proxy server. You need to create a. wgetrc file in the current user's directory. The proxy server can be set in the file:

Http-proxy = 111.111.111.111:8080
Ftp-proxy = 111.111.111.111:8080

Represents the HTTP proxy server and the FTP proxy server, respectively. If the proxy server requires a password, use:

--proxy-user=user Set Proxy User
--proxy-passwd=pass Set Proxy password

These two parameters. Use the parameter--proxy=on/off to use or close the agent.

Wget also has a lot of useful features that require users to dig.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.