Linux Shell script programming-wget command usage details

Source: Internet
Author: User
Tags ftp site

Getting started with Linux shell scripts -- wget command usage

Wget is an open-source software developed in Linux, written by Hrvoje Niksic, and subsequently transplanted to various platforms including windows. It has the following functions and features:
(1) Support for resumable data transfer. This is also the biggest selling point of the year for network ant financial and flashget. Now, wget can also use this function. users who are not good at the network can rest assured;
(2) FTP and HTTP download methods are supported at the same time. Although most software can be downloaded through HTTP, FTP download is still required in some cases;
(3) Support for proxy servers. For systems with high security, generally, their systems are not directly exposed on the Internet. Therefore, support for proxy is a required function for downloading software;
(4) easy to set; maybe, users who are used to the graphic interface are not too familiar with command line. However, the command line has more advantages in setting, at least, the mouse can be clicked many times, and do not worry if the mouse is wrong;
(5) The program is small and completely free. The program is small and negligible, because the hard disk is too large now. If it is completely free, you have to consider it. Even if there are many so-called free software on the network, however, advertisements for these software are not what we like;

Although wget is powerful, it is relatively simple to use. The basic syntax is: wget [parameter list] URL. The following uses a specific example to describe how to use wget.
1. Download the entire HTTP or FTP site.
Wget http://place.your.url/here
This command can download the http://place.your.url/here home page. Using-X will force the creation of identical directories on the server. If the-nd parameter is used, all downloaded content on the server will be added to the local directory.

Wget-r http://place.your.url/here
This command downloads all directories and files on the server in a recursive way. The essence is to download the entire website. This command must be used with caution, because during the download, all the addresses pointed to by the downloaded website will be downloaded in the same way. Therefore, if this website references other websites, the referenced website will also be downloaded! For this reason, this parameter is not commonly used. You can use the-l number parameter to specify the download level. For example, to download only two layers, use-L 2.

If you want to make an image site, you can use the-M parameter, for example, wget-M http://place.your.url/here
At this time, wget will automatically determine the appropriate parameters to create an image site. Then, wgetwill be uploaded to the server and read to robots.txtand executed according to robots.txt.

2. resumable upload.
When the file size is very large or the network speed is very slow, the connection is often cut off before the file is downloaded. In this case, resumable data transfer is required. The resumable upload of wget is automatic. You only need to use-CParameters, for example:
Http://the.url.of/incomplete/file wget-C
Resumable data transfer requires the server to support resumable data transfer. The-t parameter indicates the number of retries. For example, if you need to retry 100 times, write-t 100. If it is set to-T 0, it indicates an infinite number of retries until the connection is successful. The-t parameter indicates the timeout wait time, for example,-T 120, indicating that a timeout occurs even if the connection fails for 120 seconds.

3. Batch download.
If multiple files are downloaded, you can generate a file, write the URL of each file in a line, such as the generated file download.txt, and then run the command wget-I download.txt.
This will download all the URLs listed in download.txt. (If the column is a file, download the file. If the column is a website, download the homepage)

4. Selective download.
You can specify that wget only downloads one type of files, or does not download any files. For example:
Wget-m-reject = GIF http://target.web.site/subdirectory
Download http://target.web.site/subdirectory, but the GIF file is omitted. -Accept = list acceptable file types,-reject = List reject accepted file types.

5. Password and authentication.
Wget can only process websites restricted by user name/password. Two parameters can be used:
-Http-user = User: Set the HTTP user
-Http-passwd = pass: Set the HTTP Password
Websites that require certificate authentication can only use other download tools, such as curl.

6. Use the proxy server for download.
If your network needs to go through the proxy server, you can have wget download files through the proxy server. Create a. wgetrc file in the current user directory. You can set the proxy server in the file:
HTTP-proxy = 111.111.111.111: 8080
FTP-proxy = 111.111.111.111: 8080
Indicates the HTTP Proxy server and the FTP Proxy Server respectively. If the proxy server requires a password, use:
-Proxy-user = User: sets the proxy user.
-Proxy-passwd = pass: sets the proxy password.
These two parameters.
Use the-proxy = on/off parameter to use or disable the proxy.
Wget also has many useful functions that need to be mined by users.

Code summary table:

● Start:-V, -- version: After wget version is displayed, exit-h, -- help print syntax help-B, -- background is started and transferred to the background for execution-E, -- execute = Command Execution '. wgetrc 'command. For the wgetrc format, see/etc/wgetrc or ~ /.Wgetrcwget.txt the robots.txt of the website can be bypassed by using the-e robots = off parameter ● record and input file:-o, -- output-file = file to write the record to file-, -- append-output = file append the record to the file-D, -- debug print debug output-Q, -- Quiet quiet mode (no output)-V, -- verbose lengthy mode (this is the default setting)-NV, -- Non-verbose turn off the lengthy mode, but not quiet mode -I, -- input-file = file: the URL that appears when the file is downloaded.-F, -- force-HTML treats the input file as an HTML file-B, -- base = URL uses the URL as the prefix of the relative link in the file specified by the-f-I parameter -- sslcertfile = file optional client certificate -- sslcertkey = Keyfile optional client certificate Keyfile -- EGD-file = file specifies the file name of the EGD socket ● download: -- bind-address = address specifies the local address (host name or IP address, used when there are multiple local IP addresses or names) -T, -- tries = Number sets the maximum number of attempts to connect (0 indicates no limit ).-O -- output-document = file: Write the document to the file-nc. -- no-clobber should not overwrite the existing file or use the. # prefix. -C, -- continue, and then download the files that have not been downloaded-- Progress = type: set the process bar flag -N, -- timestamping-S, -- server-response Print Server Response -- Spider does not download anything -T, -- timeout = seconds: set the number of seconds for response timeout.-W, -- Wait = seconds interval between two attempts seconds -- waitretry = seconds waits for 1... seconds seconds -- random-Wait waits for 0 to download... 2 * wait second-y, -- proxy = On/Off open or close proxy-Q, -- quota = Number sets the download capacity limit -- limit-rate = rate limits the download rate ● Directory:-Nd -- no-directories does not create a directory -X, -- force-directories force Directory Creation -NH, -- no-host-directories do not create the host directory-P, -- directory-Prefix = prefix save the file to the directory prefix/... -- Cut-dirs = Number ignore the remote directory of the number layer Eg: wget-Q-n-X-NH -- timeout = 10 -- tries = 3-I "newlists"● HTTP option: -- http-user = User: Set the HTTP user name to user. -- http-passwd = pass sets the HTTP password to PASS.-C, -- cache = ON/OFF allows/does not allow server-side data caching (usually ). -E, -- HTML-Extension: Save all text/html files with the. html extension -- ignore-length ignore 'content-length' header field -- header = string insert string in headers -- proxy-user = user settings the proxy username is user -- proxy-passwd = pass. Set the proxy password to pass -- Referer = URL. The HTTP request contains 'Referer: URL 'header-S, -- save-headers to save the HTTP header to the file-u, -- User-Agent = agent The name is agent instead of wget/version. -- no-http-keep-alive disable the HTTP activity Link (permanent link ). -- cookies = off do not use cookies. -- load-Cookies = File Load the cookie from the file before starting the session -- save-Cookies = File Save the cookies to the file after the session ends ● FTP option:-Nr, -- Dont-Remove-listing does not remove '. listing 'file-G, -- glob = on/off open or close the globbing mechanism of the file name -- passive-FTP use passive transmission mode (default value ). -- active-FTP uses the active transmission mode -- in recursive mode, retr-symlinks points the link to a file (rather than a directory) ● recursive download:-R, -- Recursive recursive download-use with caution! -L, -- level = maximum recursive depth of number (inf or 0 indicates infinity ). -- delete-after the current time, partial deletion of the file-K, -- convert-links convert non-relative link to relative link-K, -- backup-converted before converting file X, back up X. orig-M, -- mirror is equivalent to-r-N-l INF-Nr. -P, -- page-requisites: Download and display all images of the HTML file ● include and not include (accept/reject) in recursive download:-, -- accept = List the list of accepted extensions separated by semicolons-R, -- reject = List the list of accepted extensions separated by semicolons-D, -- domains = List the list of accepted domains separated by semicolons -- exclude-domains = List the list of untrusted domains separated by semicolons -- follow-FTP tracking FTP links in HTML documents -- follow- tags = List the list of Tracked HTML tags separated by semicolons-G, -- ignore-tags = List the list of ignored HTML tags separated by semicolons-h, -- span-hosts is recursively transferred to the external host-l, -- relative only traces relative links-I, -- include-directories = List list of allowed directories-X, -- exclude-directories = List list of excluded directories-NP, -- no-parent should not be traced back to the parent directory

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.