wget command Details (attached to specific examples)

Source: Internet
Author: User
Tags mirror website

Note: The red is for my personal opinion compared to the usual amount of parameters.

    1. Introduced

Linux wget is a tool for downloading files, which is used at the command line. This is an essential tool for Linux users, especially for network administrators, who often download software or restore backups from remote servers to a local server. If we use a virtual host, we can only download from the remote server to our computer disk and then upload it to the server using the FTP tool. This is a waste of time and energy, there is no way to do. To a Linux VPS, it can be downloaded directly to the server without having to upload this step. The Wget tool is small but fully functional, it supports breakpoint download function, supports FTP and HTTP download, supports proxy server and is easy to set up. Below we explain how to use wget as an example.


2. Common parameter interpretation.


(1) use wget to download individual files

The following example is to download a file from the network and save it in the current directory

wget Http://cn.wordpress.org/wordpress-3.1-zh_CN.zip


A progress bar is displayed during the download, including (Percent download complete, bytes already downloaded, current download speed, remaining download time).

(2) use Wget-o to download and save with a different file name

The wget default is named after the last character that matches the "/":

The following example downloads a file and saves it by name download.php?id=1080

wget http://www.centos.bz/download?id=1
To solve this problem, we can use the parameter-o to specify a file name:

Wget-o Wordpress.zip http://www.centos.bz/download.php?id=1080

(3) download with wget–limit-rate speed limit
When you execute wget, it will use all possible broadband downloads by default. But when you're ready to download a large file and you still need to download other files, it's necessary to limit the speed.

wget–limit-rate=300k Http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
(4) using the Wget-c breakpoint to continue the transmission
To restart the download of the interrupted file using wget-c:

Wget-c Http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
It is very helpful for us to download a large file because of the interruption of network and other reasons, we can continue to download the file instead of downloading it again. You can use the-c parameter when you need to continue the interrupted download.

(5) use wget-b background download
For downloading very large files, we can use the parameter-B to download the background.

Wget-b Http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
Continuing in background, PID 1840.
Output'll is written to ' Wget-log '.
You can use the following command to view the download progress

Tail-f Wget-log

(6) Disguise proxy name download
Some websites may reject your download request by judging the proxy name as not a browser. But you can disguise it by –user-agent parameters.

Wget–user-agent= "mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/534.16 (khtml, like Gecko) chrome/10.0.648.204 safari/534.16″ Download link

(7) use wget–spider test download link
When you plan to do a timed download, you should test the download link at the scheduled time to see if it is valid. We can increase the –spider parameter to check.

Wget–spider URL
If the download link is correct, it will show

Wget–spider URL
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response ... OK
length:unspecified [text/html]
Remote file exists and could contain further links,
But recursion is disabled-not retrieving.
This ensures that the download can take place at the scheduled time, but when you give the wrong link, the following error will be displayed

Wget–spider URL
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response ... 404 Not Found
Remote file does not exist-broken link!!!
You can use the spider parameter in the following situations:

Check before scheduled download
Interval detect whether a site is available
Check for dead links on site pages

(8) use wget–tries to increase the number of retries
It is also possible to fail if the network is having problems or downloading a large file. wget default retry 20 connection download file. If necessary, you can use –tries to increase the number of retries.

Wget–tries=40 URL

(9) download multiple files using wget-i
First, save a copy of the download link file

Cat > Filelist.txt
Url1
Url2
Url3
Url4
Then use this file and parameters-I download

Wget-i filelist.txt

(ten) using Wget–mirror Mirror website
The following example is to download the entire website to local.

Wget–mirror-p–convert-links-p./local URL
–miror: Account opening image download
-P: Download all files for HTML page to display normal
–convert-links: After download, convert cost to link
-P./local: Save all files and directories to a locally specified directory

(11) Use Wget–reject to filter the specified format download
you want to download a website, but you do not want to download the picture, you can use the following command.

wget–reject=gif URL

(12
you do not want the download information to appear directly in the terminal but in a log file, you can use the following command:

wget-o download.log URL

(13) Limit total download file size using Wget-q
when you want to download more than 5M and exit the download, you can use the following command:

wget-q5m-i filelist.txt
Note: This parameter does not work for a single file download and is only valid when the download is recursive.

(14) Use Wget-r-A to download the specified format file
You can use this feature in the following situations

Download All pictures of a website
Download all videos of a website
Download all PDF files for a Web site
wget-r-a.pdf URL

( use wget ftp download
/strong> you can use wget to complete the download of the FTP link.
Download using wget anonymous FTP

wget ftp-url

ftp download with wget username and password Authentication

wget–ftp-user=username–ftp-password= PASSWORD URL

(16) passwords and certifications.
Wget can only handle websites that restrict access using username/password, with two parameters:
–http-user=user setting up an HTTP user
–http-passwd=pass Setting the HTTP password
For sites that require certificates for certification, you can only use other download tools, such as curl.

download using a proxy server.
If the user's network needs to go through a proxy server, then you can let wget through the proxy server for file download. At this point, you need to create a. wgetrc file in the current user's directory. You can set up a proxy server in the file:
Http-proxy = 111.111.111.111:8080
Ftp-proxy = 111.111.111.111:8080
Represents the proxy server for HTTP and the proxy server for FTP, respectively. If the proxy server requires a password, use:
–proxy-user=user setting up a proxy user
–proxy-passwd=pass Setting the proxy password
These two parameters.
Use the parameter –proxy=on/off or close the agent.


Wget also has a lot of useful features that users need to dig into.


Appendix:

Command format:
wget [parameter list] [target software, Web page URL]

-v,–version Display the software version number and then exit;
-H,–HELP display software help information;
-e,–execute=command execute a ". Wgetrc" command

-o,–output-file=file Save the software output information to a file;
-a,–append-output=file Append the software output information to the file;
-d,–debug display output information;
-q,–quiet does not display output information;
-i,–input-file=file get the URL from the file;

-t,–tries=number Download count (0 = infinite)
-o–output-document=file download file to another file name
-nc,–no-clobber do not overwrite files that already exist
-n,–timestamping only download newer files than the local
-t,–timeout=seconds setting the time-out period
-y,–proxy=on/off Closing the agent

-nd,–no-directories do not create a directory
-x,–force-directories forcing a directory to be established

–http-user=user setting up an HTTP user
–http-passwd=pass Setting the HTTP password
–proxy-user=user setting up a proxy user
–proxy-passwd=pass Setting the proxy password

-r,–recursive Download the entire website, directory (use caution)
-l,–level=number Download Hierarchy

-a,–accept=list types of files that can be accepted
-r,–reject=list rejected file types
-d,–domains=list domain names that can be accepted
–exclude-domains=list denied domain name
-l,–relative Download Associated Links
–follow-ftp Download Only FTP links
-h,–span-hosts can download the outside host
-i,–include-directories=list Allowed Directories
Directories rejected by-x,–exclude-directories=list




                                                                                                                                                                                                      


wget command Details (attached to specific examples)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.