Download tool for Linux wget command line

Source: Internet
Author: User

We recommend a Linux wget command system that is very useful to you. For example, we will give you a full introduction to the Linux wget command system and hope to use the Linux wget command system.

I. Introduction to Linux wget

Wget is a command line download tool on linux. This is a free software under the GPL license. Linux wget supports HTTP and FTP protocols, and supports proxy servers and resumable data transfer. It can automatically recursive remote host directories, locate eligible files, and download them to a local hard disk; if necessary, Linux wget will properly convert the super connection in the page to generate a browsed image in the mirror. Since there is no interactive interface, Linux wget can run in the background, intercept and ignore the HANGUP signal, so after the user launches the login, it can continue to run. Generally, Linux wget is used to download files on Internet websites in batches or create remote website images.

Example 2

Download and download the Home Page 192.168.1.168 and display the download information Linux wget-d http: // 192.168.1.168 download the Home Page 192.168.1.168 without displaying any information wget-q http: // 192.168.1.168download all files of the link contained in filelist.txt wget-I filelist.txt

Download To The wget-P/tmp ftp: // user: passwd @ url/file directory to download the file to the/tmp directory. Linux wget is a command line download tool. Linux users are using it almost every day. The following describes some useful Linux wget tips to help you use Linux wget more efficiently and flexibly.

* $ Wget-r-np-nd http://example.com/packages/this command can download all files in the packages directory on the http://example.com website. -Np does not traverse the parent directory.-nd indicates that the directory structure is not re-created on the local machine.

* $ Wget-r-np-nd -- accept = iso http://example.com/centos-5/ I #/with the preceding command, but an -- accept = iso option is added, which indicates that Linux wget only downloads all files with the iso extension in the i386 directory. You can also specify multiple extensions, which can be separated by commas.

* $ Wget-I filename.txt this command is often used for batch download. Put the addresses of all files to be downloaded into filename.txt, and then Linux wget will automatically download all files for you.

* $ Wget-c http://example.com/really-big-file.isohere, the-c option is used for resumable upload.

* $ Wget-m-k (-H) http://www.example.com/this command can be used to mirror a website. Linux wget will convert the link. If the images on the website are placed on another site, you can use the-H option.

Three Parameters

Code: $ wget -- helpGNU Wget 1.9.1, a non-interactive network file download tool. Usage: required parameters for Linux wget [Options]... [URL]... long options are also required when short options are used.

Start:

 
 
  1. -V, -- version: displays the Wget version and exits.
  2. -H, -- help print this help.
  3. -B,-enter the background operation after the background is started.
  4. -E,-Execute=COMMANDRun the '. wgetrc' command.

Log records and input files:

 
 
  1. -O,-- Output-file= The file writes the log message to the specified file.
  2. -,-- Append-output= The file appends the log message to the end of the specified file.
  3. -D, -- debug prints the debugging output.
  4. -Q, -- quiet mode (no output ).
  5. -V, -- verbose detailed output mode (default ).
  6. -Nv, -- non-verbose disable detailed output mode, but do not enter quiet mode.
  7. -I,-- Input-file= Download the URL found in the specified file.
  8. -F, -- force-html process the input file in HTML format.
  9. -B,-- Base=URLWhen you use the-F-I file option, add the specified URL before the relative link.

Download:

 
 
  1. -T,-- Tries= The number of retries configured. 0 indicates unlimited ).
  2. -- Retry-connrefused: Try again even if the connection is rejected.
  3. -O-- Output-document= File writes data to this file.
  4. -Nc, -- no-clobber does not change the existing file, nor use the method of adding. # As a number after the file name) to write a new file.
  5. -C, -- continue continues to receive a part of the downloaded files.
  6. -- Progress= Method: select the representation of the download progress.
  7. -N, -- timestamping will not be retrieved unless the remote file is newer.
  8. -S, -- server-response: displays the server response Message.
  9. -- Spider does not download any data.
  10. -T,-- Timeout= Set the timeout time (in seconds) for Data Reading ).
  11. -W,-- Wait= Number of seconds to wait for receiving different files.
  12. -- Waitretry= Seconds: Wait for a period of time between each retry (from 1 second to the specified number of seconds ).
  13. -- Random-wait: WAIT for a while to receive different files (from 0 seconds to 2 * wait seconds ).
  14. -Y,-- Proxy=On/Off enable or disable the proxy server.
  15. -Q,-- Quota= Size: the maximum size of the received data.
  16. -- Bind-address= The address is connected using the specified address (host name or IP address) of the local machine.
  17. -- Limit-rate= The download speed is limited.
  18. -- Dns-cache=OffDo not search for DNS in the cache.
  19. -- Restrict-file-names=OSRestrict the characters in the file name to the characters allowed by the specified OS (operating system.

Directory:

 
 
  1. -Nd -- no-directories does not create a directory.
  2. -X, -- force-directories force Directory Creation.
  3. -NH, -- no-host-directories do not create a directory containing the remote host name.
  4. -P,-- Directory-prefix= Create a directory with the specified name before saving the file.
  5. -- Cut-dirs= Number ignore the specified number of directory layers in the remote directory.

Http options:

 
 
  1. -- Http-user= The user configures the http user name.
  2. -- Http-passwd= Configure the http user password for the password.
  3. -C,-- Cache=On/Off (not) use the data in the high-speed cache on the server (which is used by default ).
  4. -E, -- html-extension adds the. html extension file name to all MIME-type text/html files.
  5. -- Ignore-length ignores the Content-Length header field.
  6. -- Header= Add the specified string to the file header.
  7. -- Proxy-user= The user configures the proxy server user name.
  8. -- Proxy-passwd= Password: configure the proxy server user password.
  9. -- Referer=URLThe HTTP request contains the "Referer: URL" header.
  10. -S, -- save-headers: saves the HTTP header to a file.
  11. -U,-- User-agent=AGENTIt is marked as AGENT rather than Wget/VERSION.
  12. -- No-http-keep-alive disable HTTP keep-alive persistent connections ).
  13. -- Cookies=OffDisable cookies.
  14. -- Load-cookies= The cookie is loaded into the specified file before the file session starts.
  15. -- Save-cookies= After the file session ends, save the cookie to the specified file.
  16. -- Post-data= The string uses the POST method to send the specified string.
  17. -- Post-file= The file uses the POST method to send the content in the specified file.

HTTPS (SSL) Options:

 
 
  1. -- Sslcertfile= Optional client segment certificate for the file.
  2. -- Sslcertkey= The key file is an optional "key file" for this certificate ".
  3. -- Egd-file= File EGD socket file name.
  4. -- Sslcadir= Directory where the CA hash is located.
  5. -- Sslcafile= The file contains the CA file.
  6. -- Sslcerttype=0/1 Client-Cert Type0=PEM(Default )/1=ASN1(DER)
  7. -- Sslcheckcert=0/1 check the server certificate based on the provided CA
  8. -- Sslprotocol=0-3 select the SSL protocol;0= Automatically selected,
  9. 1=SSLv2 2=SSLv3 3=TLSv1 

FTP options:

 
 
  1. -Nr, -- dont-remove-listing does not delete the ". listing" file.
  2. -G,-- Glob=On/Off sets whether to expand the file name with wildcards.
  3. -- Passive-ftp uses "passive" transmission mode.
  4. -- Retr-symlinks: In recursive mode, download the file indicated by the Link (except for connecting to a directory ).

Recursive download:

 
 
  1. -R, -- recursive download.
  2. -L,-- Level= Maximum recursive depth of a number (inf or 0 indicates unlimited ).
  3. -- Delete-after: delete the downloaded file.
  4. -K, -- convert-links converts absolute links to relative links.
  5. -K, -- backup-converted file X is backed up as X. orig before conversion.
  6. -M, -- mirror is equivalent to the-r-N-l inf-nr option.
  7. -P, -- page-requisites download all the files required to display the complete web page, such.
  8. -- Strict-comments: Enable the strict (SGML) processing option for HTML remarks.

Options for receiving/rejecting recursive downloads:

 
 
  1. -,-- Accept= List of accepted file styles, separated by commas.
  2. -R,-- Reject= List of excluded file styles separated by commas.
  3. -D,-- Domains= List of accepted domains, separated by commas.
  4. -- Exclude-domains= List of excluded domains separated by commas.
  5. -- Follow-ftp follows the FTP link in the HTML file.
  6. -- Follow-tags= The HTML tag to be followed in the list, separated by commas.
  7. -G,-- Ignore-tags= List of HTML tags to be ignored, separated by commas.
  8. -H, -- span-hosts can enter other hosts recursively.
  9. -L, -- relative only follows the relative link.
  10. -I,-- Include-directories= List of directories to be downloaded.
  11. -X,-- Exclude-directories= List of directories to be excluded.
  12. -Np, -- no-parent does not search for upper-level directories.

Iv. FQA

A. Use wget tool linux so the main version comes with Linux wget this download tool. bash $ wget unzip wget-m-L-reject = gif http://target.web.site/subdirectory

Linux wget can also implement resumable data transfer (-c parameter). Of course, this operation requires remote server support. bash $ wget-c parameters.

If you think that the disconnection during the download will affect your office, you can limit the number of Linux wget retries. bash $ wget-t 5 http://place.your.url/herethis example is discarded Five times later. The parameter "-t inf" indicates never giving up. retry without stopping.

B. What should I do for the proxy service? You can use http Proxy parameters or. the wgetrc configuration file specifies how to download data through a proxy. however, if you use a proxy for resumable data transfer, it may fail several times. if the download process is interrupted, the entire file copy is saved in the cache on the proxy server. so when you use "wget-c" to download the remaining part, the proxy server checks its cache and mistakenly considers that you have downloaded the entire file. then an error message is sent. at this point you can add a specific request parameter to urge the proxy server to clear their cache: bash $ wget-c-header = "Pragma: no-cache" http://place.your.url/here

This "-header" parameter can be added using various numbers and methods. Through this, we can change some attributes of the web server or proxy server. Some sites do not provide file services for external connections. The content will be submitted only when other pages on the same site are used. In this case, you can add the "Referer:" parameter: bash $ wget-header = "Referer: http://coming.from.this/page" into wget-header =" User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt) "http://msie.only.url/here

C. How can I set the download time?
If you need to download some large files through a connection shared with other colleagues on your office computer, and if you want your colleagues to not be affected by the slow speed of the network, you should avoid peak hours as much as possible. Of course, you don't need to wait in the office until all the people go away, or even remember to download them online once you have used up dinner at home. You can use at to customize the work time: bash $ at 23: 00 warning: commands will be executed using/bin/shat> wget http://place.your.url/hereat> press Ctrl-D, we set the download task to be performed. Make sure that the background program atd is running.

D. How long does it take to download?
When you need to download a large amount of data and you do not have enough bandwidth, you will often find that the download task you have arranged has not been completed yet, A day's work is about to begin.
As a good colleague, you can only stop these tasks and start other jobs. Then you need to repeatedly use "wget-c" to complete your download. This must be too cumbersome, so it is best to use crontab for automatic execution. Create a pure folder named crontab.txt, containing the following content: 0 23 ** 1-5 wget-c-N http://place.your.url/here0 6 ** 1-5 killall wgetz the crontab file specifies that certain tasks are executed on a regular basis. The first five columns declare when to execute this command, while the rest of each line tells crontab what to execute.

The first two columns specify that Linux wget is used for downloading every day from PM to PM, and all Linux wget downloads will be stopped from AM. * In the third column indicates that this task is executed every day of every month. The fifth column specifies the day of the week to run the program. -"1-5" indicates from Monday to Friday. In this way, the download task starts at every workday. At Am, any Linux wget task is stopped. Run the following command to execute crontab: bash $ crontab crontab.txt.

The "-N" parameter in Linux wget will check the timestamp of the target file. If it matches, the download program will stop because it indicates that the entire file has been completely downloaded. You can use "crontab-r" to delete the schedule. I have used this method for many times. It is more practical to download many ISO image files by dialing the shared phone.

E. How to download dynamic web pages
Some web pages change several times per day as required. therefore, technically speaking, the target is no longer a file, and it has no file length. therefore, the "-c" parameter is meaningless. for example: a linux weekend news page written in PHP and frequently changed: bash $ wget http://lwn.net/bigpage.php3

The network conditions in my office are often poor, causing a lot of trouble for my download. So I wrote a simple script to check whether the dynamic page has been completely updated.

 
 
  1. #!/bin/bash  
  2. #create it if absent  
  3. touch bigpage.php3  
  4. #check if we got the whole thing  
  5. while ! grep -qi bigpage.php3  
  6. do  
  7. rm -f bigpage.php3  
  8. #download LWN in one big page  
  9. wget http://lwn.net/bigpage.php3  
  10. done 

This script can continuously download the webpage until "" appears in the webpage, which indicates that the file has been completely updated.

F. What should I do with ssl and Cookies?
If you want to use ssl to access the Internet, the website address should start with "https. in this case, you need another download tool called curl, which can be easily obtained. some websites force users to use cookies when browsing. therefore, you must obtain the "Cookie:" parameter from the Cookie obtained on the website. this ensures that the download parameters are correct. for lynx and Mozilla Cookie file formats, use the following:
Bash $ cookie = $ (grep nytimes ~ /. Lynx_cookies | awk {printf ("% s = % s;", $6, $7)}) can construct a request Cookie to download content on the http://www.nytimes.com. of course, you have to use this browser to complete registration on this website. w3m uses a different, smaller Cookie file format: bash $ cookie =$ (grep nytimes ~ /. W3m/cookie | awk {printf ("% s = % s;", $2, $3 )})
Now you can download it in this way: bash $ wget-header = "Cookie: $ cookie" http://www.nytimes.com/reuters/technology/tech-tech-supercomput.html
Or use the curl tool: bash $ curl-v-B $ cookie-o supercomp.html http://www.nytimes.com/reuters/technology/tech-tech-supercomput.htm

G. How do I create an address list?
So far, we have downloaded a single file or the entire website. sometimes we need to download a large number of files linked to a webpage, but there is no need to mirror the entire website. for example, we want to download the first 20 songs from one of the 100 songs in sequence. note that the "-accept" and "-reject" parameters do not work here, because they only work for file operations. therefore, you must use the "lynx-dump" parameter instead.
Bash $ lynx-dump ftp://ftp.ssc.com/pub/lg/ | grep gz $ | tail-10 | awk {print $2}> urllist.txt
In the scripts file. Then we can write a simple bash script to automatically download the target file in this file:
Bash $ for x in $ (cat urllist.txt)
> Do
> Wget $ x
> Done
In this way, we can successfully download the latest 10 topics on the Linux Gazette website (ftp://ftp.ssc.com/pub/lg.

H. Extend the bandwidth used
If you choose to download a file subject to bandwidth restrictions, your download will become slow due to server restrictions. the following tips greatly shorten the download process. however, this technique requires you to use curl and the remote server has multiple images for you to download. for example, assume that you want to download the Mandrake 8.0 from the following three addresses:
Url1 = http://ftp.eecs.umich.edu/pub/linux/mandrake/iso/Mandrake80-inst.iso
Url2 = http://ftp.rpmfind.net/linux/Mandrake/iso/Mandrake80-inst.iso
Url3 = http://ftp.wayne.edu/linux/mandrake/iso/Mandrake80-inst.iso
This file is 677281792 bytes in length, so the curl program is used to add the "-range" parameter to create three simultaneous downloads:
Bash $ curl-r 0-199999999-o mdk-iso.part1 $ url1 &
Bash $ curl-r 200000000-399999999-o mdk-iso.part2 $ url2 &
Bash $ curl-r 400000000--o mdk-iso.part3 $ url3 &
In this way, three background processes are created. Each process transmits different parts of the ISO file from different servers. This "-r" parameter specifies the byte range of the target file. When the three background processes
After the process ends, use a simple cat command to connect the three files-cat mdk-iso.part? > Mdk-80.iso. (It is strongly recommended to check md5 before the dial)
You can also use the "-verbose" parameter to make each curl process have its own window to display the transmission process.

  1. Linux partition naming will be clearer and more detailed
  2. The partition solution that saves the most in Linux
  3. Linux mysql Command installation allows remote connection
  4. Ubuntu Linuxc System
  5. Linux FMS3: Configuring an fms Virtual Server

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.