wget Command 2 (reprint)

Source: Internet
Author: User
Tags create directory file url ftp site

Wget is a free tool that automatically downloads files from the web. It supports HTTP,HTTPS and FTP protocols and can use HTTP proxies.

The so-called automatic download means that the wget can be executed in the background after the user exits the system. This means that you can log in to the system, start a wget download task, and then exit the system, and wget will be executed in the background until the task is completed, which saves a lot of hassle when the user needs to be involved in downloading large amounts of data from most other browsers.
Wget can follow the links on the HTML page and download it to create a local version of the remote server, completely rebuilding the directory structure of the original site. This is often referred to as a "recursive download". In the recursive download, wget follows the robot Exclusion Standard (/robots.txt). wget can switch links to local files for offline browsing while downloading.
The wget is very stable, and it has a strong adaptability in the case of very narrow bandwidth and unstable networks. If the download fails because of the network, wget will keep trying until the entire file is downloaded. If the server interrupts the download process, it will again be linked to the server to continue downloading from where it stopped. This is useful for downloading large files from servers that have limited link times.
Common uses of wget
wget although powerful, but it is relatively simple to use,
The basic syntax is: wget [parameter list] "URL" is caused by "" to avoid downloading errors caused by special characters in the URL.
Here are some examples to illustrate the use of wget.
1, download the entire HTTP or FTP site.
wget Http://place.your.url/here
This command can download the Http://place.your.url/here home page. Using-X forces a directory to be identical on the server, and if you use the-nd parameter, all content downloaded by the server is added to the local current directory.

Wget-r Http://place.your.url/here
This command will follow the recursive method of downloading all directories and files on the server, essentially downloading the entire site. This command must be used with caution, because at the time of download, all the addresses that the downloaded site points to are also downloaded, so if the site references other sites, the referenced sites will be downloaded as well! For this reason, this parameter is not commonly used. You can use the-l number parameter to specify the level of the download. For example, to download only two tiers, use-l 2.

If you want to create a mirror site, you can use the-m parameter, for example: Wget-m http://place.your.url/here
At this point wget will automatically determine the appropriate parameters to make the mirror site. At this point, wget will log on to the server, read into the robots.txt and follow the robots.txt rules.

2, the breakpoint continues to pass.
When the file is particularly large or the network is particularly slow, often a file has not been downloaded, the connection has been cut off, at this point, the need to continue to pass the breakpoint. Wget's breakpoint continuation is automatic and requires only the-c parameter, for example:
Wget-c Http://the.url.of/incomplete/file
Using a breakpoint to resume requires the server to support the continuation of the breakpoint. The-t parameter indicates the number of retries, such as the need to retry 100 times, then write-T 100, if set to-T 0, indicates an infinite retry until the connection succeeds. The-t parameter indicates a time-out wait, such as-t 120, which means that waiting for 120 seconds does not connect even if it times out.

3, Bulk download.
If you have more than one file to download, you can generate a file, write one line for each file URL, such as generate file Download.txt, and then use the command: Wget-i download.txt
This will download each URL listed in Download.txt. (If the column is a file to download the file, if the column is a site, then download the first page)

4, Selective download.
You can specify that you want wget to download only one type of file, or not to download it. For example:
Wget-m--reject=gif Http://target.web.site/subdirectory
Indicates that the http://target.web.site/subdirectory is downloaded, but the GIF file is ignored. --accept=list can accept the file type,--reject=list rejects the accepted file type.

5, Password and authentication.
Wget can only handle websites that restrict access using username/password, with two parameters:
--http-user=user setting up an HTTP user
--http-passwd=pass Setting the HTTP password
For sites that require certificates for certification, you can only use other download tools, such as curl.

6, the use of Proxy server for download.
If the user's network needs to go through a proxy server, then you can let wget through the proxy server for file download. At this point, you need to create a. wgetrc file in the current user's directory. You can set up a proxy server in the file:
Http-proxy = 111.111.111.111:8080
Ftp-proxy = 111.111.111.111:8080
Represents the proxy server for HTTP and the proxy server for FTP, respectively. If the proxy server requires a password, use:
--proxy-user=user setting up a proxy user
--proxy-passwd=pass Setting the proxy password
These two parameters.
Use the parameter--proxy=on/off or close the agent.
Wget also has a lot of useful features that users need to dig into.




wget format for use
Usage:wget [OPTION] ... [URL] ...
* Do site mirroring with wget:
Wget-r-p-np-khttp://dsec.pku.edu.cn/~usr_name/
# or
Wget-mhttp://dsec.pku.edu.cn/~usr_name/
* Download a partially downloaded file on an unstable network and download it during idle hours
Wget-t 0-w 31-cHttp://dsec.pku.edu.cn/BBC.avi-O Down.log &
# or read the list of files to download from filelist
Wget-t 0-w 31-c-BFtp://dsec.pku.edu.cn/linuxsoft-I filelist.txt-o Down.log &
The above code can also be used to download during periods when the network is relatively idle. My usage is: in Mozilla will not be convenient to download the URL link is copied into memory and then pasted into the file Filelist.txt, in the evening to go out of the system before the execution of the above code of the second article.
* Download with Agent
Wget-y On-p-Khttps://sourceforge.net/projects/wvware/
The agent can be set in the environment variable or the WGETRC file
# Set the proxy in the environment variable
Export proxy=http://211.90.168.94:8080/
# set up a proxy in ~/.wgetrc
Http_proxy =http://proxy.yoyodyne.com:18023/
Ftp_proxy =http://proxy.yoyodyne.com:18023/


wget Category List of various options
* Start
-V,--version displays wget version after exiting
-H,--help print syntax Help
-B,--background boot to background execution
-E,--execute=command execute '. Wgetrc ' Format command, WGETRC format see/ETC/WGETRC or ~/.WGETRC
* Record and input files
-O,--output-file=file writes the record to file
-A,--append-output=file append the record to the file
-D,--debug print debug output
-Q,--quiet quiet mode (no output)
-V,--verbose verbose mode (this is the default setting)
-NV,--non-verbose turn off verbose mode, but not quiet mode
-I,--input-file=file download URLs that appear in file files
-F,--force-html treats the input file as an HTML format file
-B,--base=url the URL as the prefix for the relative link that appears in the file specified by the-f-i parameter
--sslcertfile=file Optional Client certificate
--sslcertkey=keyfile Optional Client certificate keyfile
--EGD-FILE=FILE Specifies the file name of the EGD socket
* Download
--bind-address=address specifies the local use address (host name or IP, used when there are multiple IPs or names locally)
-T,--tries=number sets the maximum number of attempts to link (0 means no limit).
-O--output-document=file write the document to file
-NC,--no-clobber do not overwrite existing files or use. #前缀
-C,--continue then download the files that are not finished downloading
--progress=type Setting the Process bar flag
-N,--timestamping do not download the file again unless it is newer than the local file
-S,--server-response print server response
--spider don't load anything.
-T,--timeout=seconds sets the number of seconds to response timeout
-W,--wait=seconds interval SECONDS seconds between two attempts
--waitretry=seconds wait between Relink 1 ... Seconds sec
--random-wait wait between downloads 0 ... 2*wait sec
-Y,--proxy=on/off turn agent on or off
-Q,--quota=number set the download capacity limit
--limit-rate=rate Limit Download Transmission rate
* Catalogue
-nd--no-directories do not create a directory
-X,--force-directories Force Create directory
-NH,--no-host-directories do not create host directory
-P,--directory-prefix=prefix save file to directory prefix/...
--cut-dirs=number Ignore number layer remote directory
* HTTP Options
--http-user=user set the HTTP username to user.
--http-passwd=pass set the HTTP password to pass.
-C,--cache=on/off allows/does not allow server-side data caching (generally allowed).
-E,--html-extension saves all text/html documents with the. html extension
--ignore-length Ignore ' content-length ' header fields
--header=string inserting strings in headers string
--proxy-user=user set the user name of the agent
--proxy-passwd=pass set the password for the agent to PASS
--referer=url include ' Referer:url ' header in HTTP request
-S,--save-headers save HTTP header to file
-U,--user-agent=agent sets the agent's name as agent instead of wget/version.
--no-http-keep-alive Close the HTTP activity link (forever link).
--cookies=off does not use cookies.
--load-cookies=file loading a cookie from a file before starting a session
--save-cookies=file cookies are saved to the file after the session ends
* FTP Options
-NR,--dont-remove-listing do not remove '. Listing ' file
-G,--glob=on/off globbing mechanism for opening or closing filenames
The--PASSIVE-FTP uses the passive transfer mode (the default value).
--active-ftp using active transfer mode
--retr-symlinks the link to the file (not the directory) at the time of recursion
* Recursive download
-R,--recursive recursive download--use with caution!
-L,--level=number the maximum recursive depth (INF or 0 for Infinity).
--delete-after Delete files locally after it is finished
-K,--convert-links convert non-relative links to relative links
-K,--backup-converted back to X.orig before converting file X
-M,--mirror equivalent to-r-n-l INF-NR.
-P,--page-requisites download all pictures showing HTML files
* Included and not included in the recursive download (accept/reject)
-A,--accept=list semicolon-delimited list of accepted extensions
-R,--reject=list semicolon-delimited list of non-accepted extensions
-D,--domains=list semicolon-delimited list of accepted domains
--exclude-domains=list semicolon-delimited list of domains that are not accepted
--follow-ftp Tracking of FTP links in HTML documents
--follow-tags=list a semicolon-delimited list of tracked HTML tags
-G,--ignore-tags=list a semicolon-delimited list of ignored HTML tags
-H,--span-hosts go to external host when recursion
-L,--relative only tracks relative links
-I,--include-directories=list list of allowed directories
-X,--exclude-directories=list list of directories not included
-NP,--no-parent don't go back to the parent directory
Problem
The local directory name created by wget is processed with URL encoding rules when it encounters Chinese in the directory when it is being downloaded recursively. such as "Skynet Firewall" will be saved as "%CC%EC%CD%F8%B7%C0%BB%F0%C7%BD", which causes the reading

wget Command 2 (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.