Wget -- Web file Extraction Tool in Linux

Source: Internet
Author: User
Article Title: wget -- Web file Extraction Tool in Linux. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
Wget is a tool used in Linux to Extract files from the World Wide Web. It is a free software under the GPL license. its author is Hrvoje Niksic. . Wget supports HTTP and FTP protocols, and supports proxy servers and resumable data transfer. It can automatically recursively retrieve remote host directories, locate eligible files, and download them to a local hard disk; if necessary, wget will properly convert the super connections in the page to generate a browsed image in the mirror. Since there is no interactive interface, wget can run in the background, intercept and ignore the HANGUP signal, so after the user launches the login, it can continue to run. Generally, wget is used to download files on Internet websites in batches or create remote website images.
  
Syntax:
Wget [options] [URL-list]
URL format description: You can use a URL in the following format:
Http: // host [: port]/path
For example:
Http://fly.cc.fer.hr/
Ftp://ftp.xemacs.org/pub/xemacs/xemacs-19.14.tar.gz
Ftp: // username: password @ host/dir/file
In the last form, the user name and password are provided for the FTP host in the form of URL encoding (of course, you can also use the parameter to provide this information, as shown below ).
  
Parameter description:
  
Wget has many parameters, but most applications only need the following common parameters:
-R recursion; for HTTP hosts, wget first downloads the file specified by the URL, and then recursively downloads the file referenced by the file (Super connection) (if the file is an HTML file) all files (the recursive depth is specified by the parameter-l ). For the FTP host, this parameter means to download all files in the directory specified by the URL. The recursive method is similar to that for the HTTP host.
  
-N timestamp: this parameter specifies that wget only downloads updated files. That is to say, files with the same length as the last modification date in the local directory will not be downloaded.
  
-M image: equivalent to using both the-r and-N parameters.
  
-L sets the recursive level. The default value is 5. -L1 is equivalent to non-recursion.-l0 is infinite recursion. Note that when the recursive depth increases, the number of files will increase exponentially.
  
-T sets the number of retries. When the connection is interrupted (or times out), wget tries to reconnect. If-t0 is specified, the number of retries is set to an infinite number.
  
-C specifies the resumable upload function. In fact, wget has the resumable upload function by default. Only when you use another ftp tool to download a part of a certain file and want wget to complete this task, to specify this parameter.
  
Example:
Http://oneweb.com.cn/wget-m-l4-t0/
You can also use the-nH parameter to specify that the subdirectory is not created, but the image directory structure is directly created under the current directory. The recursive depth is 4, the number of retries is infinite (if there is a problem with the connection, wget will retry persistently forever, knowing that the task is complete !)
  
Some other parameters with lower frequency are as follows:
-A acclist/-R rejlist:
These two parameters are used to specify the file extension accepted or excluded by wget. Multiple Names are separated by commas. For example, if you do not want to download the MPEG video image file or. AU audio file, you can use the following parameters:
-R mpg, mpeg, au
  
Other parameters include:
-L only extends the relative connection. this parameter is useful for capturing the specified site and can avoid spreading to other directories on the host. For example, a personal website address is: http://www.xys.org /~ Ppfl/, use the following command line:
Wget-L http://www.xys.org /~ Ppfl/
Only the personal website is extracted, and other directories on the host www.xys.org are not involved.
  
-K conversion connection: when saving HTML files, convert non-relative connections to relative connections.
  
-X exclude specified directories when downloading files on the FTP host
  
In addition, the following parameters are used to set the wget working interface:
-V sets wget to output detailed work information.
-Q: When wget is set, no information is output.
  
If we have stored the connection of the file to be extracted in an HTML document (or a common text document), we can allow wget to extract information from the file directly, you do not need to provide the URL address in the command line. The parameter format is:
-I filename
An address file can also be an HTML file. For example, a common text file contains a list of URLs to be downloaded.
We can use the following techniques to increase the download speed: Success), and then download all the addresses listed in the file using an independent wget process.
  
For other parameters, refer to the wget man manual page. The command is:
Man wget
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.