Bulk download of files in HTTP directory using wget

Source: Internet
Author: User
Tags ftp protocol

Principle: Download the index.html that you need to down the directory page, perhaps the name is not so!!!
Then use wget to download all the links contained in the file!

Example: Wget-ve-rlnp-nh--tries=20--timeout=40--wait=5 http://mirrors.163.com/gentoo/distfiles/
Or simple point: wget-m http://mirrors.163.com/gentoo/distfiles/
You will get the Distfiles page of the index.html file, the file content of course, needless to say, it contains distfiles directory of all the source package links, to get this file is very easy, a-m parameter can also, but the file is larger, in order to prevent timeouts, wait for problems, Can be added tries,timeout,wait parameters such as resolution.

Wget-nc-b http://mirrors.163.com/gentoo/distfiles/-f-nh--cut-dirs=3-i index.html

Ok!!!

Later decided to use Tom's image to synchronize the download, but found that Tom should not allow browsing access to their Gentoo mirror page of course, will not be distfiles Index.html, so try to use 163 to get the index.html instead, after all, the inside is stored relative path, so only need to use Tom's Distfiles directory instead of 163 of the path, You can also download the image files listed in Index.html 163 from Tom!

Parameter explanation:
-B Adds a path prefix to the URLs in the specified file
-NC: Skipping files that already exist while downloading
-NH: Do not create host name directory
-I: Download all URLs listed in the files specified later in the I parameter.
-V: Display information
F: Forcing to save in HTML
-R: Recursive, which is a subdirectory of the crawl subdirectory
L: Relative path
NP: Do not skip to parent directory
--cut-dirs cut down the number of relative directories behind the hostname

   Example:
                    No options         -ftp.xemacs.org/pub/ xemacs/
                   -nh                        pub/xemacs/
                   -nh--cut-dirs=1 -xemacs/
                   -nh--cut-dirs=2  .

--cut-dirs=1-ftp.xemacs.org/xemacs/
...

There are a lot of random seven or eight parameters, such as making a directory Ah, filter ah, and so on,

wget Category List of various options
* Start
-V,--version displays wget version after exiting
-H,--help print syntax Help
-B,--background boot to background execution
-E,--execute=command execute '. Wgetrc ' Format command, WGETRC format see/ETC/WGETRC or ~/.WGETRC
* Record and input files
-O,--output-file=file writes the record to file
-A,--append-output=file append the record to the file
-D,--debug print debug output
-Q,--quiet quiet mode (no output)
-V,--verbose verbose mode (this is the default setting)
-NV,--non-verbose turn off verbose mode, but not quiet mode
-I,--input-file=file download URLs that appear in file files
-F,--force-html treats the input file as an HTML format file
-B,--base=url the URL as the prefix for the relative link that appears in the file specified by the-f-i parameter
--sslcertfile=file Optional Client certificate
--sslcertkey=keyfile Optional Client certificate keyfile
--EGD-FILE=FILE Specifies the file name of the EGD socket
* Download
--bind-address=address specifies the local use address (host name or IP, used when there are multiple IPs or names locally)
-T,--tries=number sets the maximum number of attempts to link (0 means no limit).
-O--output-document=file write the document to file
-NC,--no-clobber do not overwrite existing files or use. #前缀
-C,--continue then download the files that are not finished downloading
--progress=type Setting the Process bar flag
-N,--timestamping do not download the file again unless it is newer than the local file
-S,--server-response print server response
--spider don't load anything.
-T,--timeout=seconds sets the number of seconds to response timeout
-W,--wait=seconds interval SECONDS seconds between two attempts
--waitretry=seconds wait between Relink 1 ... Seconds sec
--random-wait wait between downloads 0 ... 2*wait sec
-Y,--proxy=on/off turn agent on or off
-Q,--quota=number set the download capacity limit
--limit-rate=rate Limit Download Transmission rate
* Catalogue
-nd--no-directories does not create a directory, Wget creates a directory by default
-X,--force-directories Force Create directory
-NH,--no-host-directories do not create host directory
-P,--directory-prefix=prefix save file to directory prefix/...
--cut-dirs=number Ignore number layer remote directory
* HTTP Options
--http-user=user set the HTTP username to user.
--http-passwd=pass set the HTTP password to pass.
-C,--cache=on/off allows/does not allow server-side data caching (generally allowed).
-E,--html-extension saves all text/html documents with the. html extension
--ignore-length Ignore ' content-length ' header fields
--header=string inserting strings in headers string
--proxy-user=user set the user name of the agent
--proxy-passwd=pass set the password for the agent to PASS
--referer=url include ' Referer:url ' header in HTTP request
-S,--save-headers save HTTP header to file
-U,--user-agent=agent sets the agent's name as agent instead of wget/version.
--no-http-keep-alive Close the HTTP activity link (forever link).
--cookies=off does not use cookies.
--load-cookies=file loading a cookie from a file before starting a session
--save-cookies=file cookies are saved to the file after the session ends
* FTP Options
-NR,--dont-remove-listing do not remove '. Listing ' file
-G,--glob=on/off globbing mechanism for opening or closing filenames
The--PASSIVE-FTP uses the passive transfer mode (the default value).
--active-ftp using active transfer mode
--retr-symlinks the link to the file (not the directory) at the time of recursion
* Recursive download
-R,--recursive recursive download--use with caution!
-L,--level=number the maximum recursive depth (INF or 0 for Infinity).
-L1 (L one) recursively downloads only the contents of the specified folder, not the next level of the directory.
--delete-after Delete files locally after it is finished
-K,--convert-links convert non-relative links to relative links
-K,--backup-converted back to X.orig before converting file X
-M,--mirror equivalent to-r-n-l INF-NR.
-P,--page-requisites download all pictures showing HTML files
* Included and not included in the recursive download (accept/reject)
-A,--accept=list semicolon-delimited list of accepted extensions
-R,--reject=list semicolon-delimited list of non-accepted extensions
-D,--domains=list semicolon-delimited list of accepted domains
--exclude-domains=list semicolon-delimited list of domains that are not accepted
--follow-ftp Tracking of FTP links in HTML documents
--follow-tags=list a semicolon-delimited list of tracked HTML tags
-G,--ignore-tags=list a semicolon-delimited list of ignored HTML tags
-H,--span-hosts go to external host when recursion
-L,--relative only tracks relative links
-I,--include-directories=list list of allowed directories
-X,--exclude-directories=list list of directories not included
-NP,--no-parent don't go back to the parent directory

FTP protocol published files are relatively simple, you can use the-r parameter plus wildcard * To replace, can be fully implemented recursive download!

Bulk download of files in HTTP directory using wget

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.