Linux wget using command parsing Daquan

Source: Internet
Author: User
Tags ftp access

Name


wget-Non-interactive Web download tool



General overview


wget [Options] ... [URL] ...



Describe


The GNU wget is a non-interactive free tool for downloading files on the Web, supporting the HTTP,HTTPS and FTP protocols, and the way HTTP proxies are used. Wget is non-interactive, which means it can run in the background without the user being logged in, unlike a Web browser.



Options


Start:


-V,--version displays the version of Wget and exits.


-H,--help print help option.


-B,-background into background operation after startup.


-E,-execute=command runs the '. Wgetrc ' form of the command,. WGETRC is a file that the user uses to wget an initialization configuration command parameter.



Log records and input files:


-O, the--output-file= file writes the log message to the specified file (it will be emptied if there is content before the specified file).


-A, the--append-output= file appends the log message to the end of the specified file (if there is content before the specified file, it will not be emptied).


-D,--debug print debug output (if there is no debug information output, this command is installed without compiling to support debug mode).


-Q,--quiet Quiet mode (does not output information).


-V,--verbose verbose output mode (default).


-NV,--non-verbose turns off verbose output mode, but does not enter quiet mode.


-I, the--input-file= file downloads the URL found from the specified file, and if you specify both the command line and the file, read the command line and then read the file;--force-html If this parameter is not specified, the URL in the file is one line at a time.


-F,--force-html the input file in HTML, allowing you to read the link URL from the HTML file, but add "<base href=" url ">" to the HTML file, or use--base= URL to set the base URL.


-B,--base=url when using the-f-i file option, add the specified URL before the relative link, so that the URL of the relative path can be processed, such as setting--base= ' http://foo/bar/a.html ', which is encountered in the HTML file. /baz/b.html ' In this case, it will be parsed into ' http://foo/baz/b.html '.



Download:


The--BIND-ADDRESS=IP address is connected using the specified address (host name or IP) of the machine. (typically used in multi-IP cases)


-T,--tries= number of configuration retries (0 for Infinity, 20 retries by default, no more retries when encountering connection refused or "not found" (404)).


--retry-connrefused Retry even if the connection is refused.


The-o--output-document= file writes data to this file.


-NC,--no-clobber For example, repeat download a link to local, this option means that you do not change a file that already exists, and you do not write a new file by using a method that adds a. # to a number after the file name.


-C,--continue continues to receive files that have been partially downloaded.


--progress= mode Select how the download progress bar is represented.


-N,--timestamping is not retrieved unless the remote file is newer.


-S,--server-response displays HTTP or FTP server header response messages.


--spider does not download any data, this action acts like a web spider.


-T,--timeout= number of seconds to configure the time-out (in seconds) to read the data.


-W,--wait= seconds sets the number of seconds to wait between different files, and it is recommended to reduce the load on the server.


--waitretry= seconds wait for a period of time between each retry after a failed download (from 1 seconds to a specified number of seconds).


--random-wait receive different files for a period of time (ranging from 0 seconds to 2*wait seconds).


-Q,--quota= size is configured to receive data in the limit size, this limit on a single file is not affected, such as wget-q10k ftp://wuarchive.wustl.edu/ls-lR.gz, then no matter Ls-lr.gz are completely downloaded, usually in a recursive download or downloaded from a specified file.


--limit-rate= rate limits the rate at which downloads are downloaded.


--dns-cache=off prevents DNS from being found in the cache.


The--restrict-file-names= value restricts the characters in the file name to the specified OS (operating system) for the allowed characters, the values that can be received are UNIX, Windows, Nocontrol, ASCII, lowercase, uppercase.


--user= User Name


--password= Password for access to FTP or HTTP specified user name and password, in the case of FTP access, can be overridden by the--ftp-user and--ftp-password options, in the case of HTTP access, can be--http-user and-- The Http-password option overrides


-4,--inet4-only only supports IPV4


-6,--inet6-only only supports IPV6


--no-iri turn off support for internationalized URI (IRI),--iri is support for opening internationalized URI (IRI), which is open by default



Directory:


-nd--no-directories does not create a directory hierarchy, and when a recursive download is encountered, all files are saved in the current directory.


-X,--force-directories forces the creation of the directory hierarchy, even if some should not be created, such as Wget-x Http://fly.srk.fer.hr/robots.txt, will be saved to download to fly.srk.fer.hr/ Robots.txt.


-NH,--no-host-directories does not create a directory containing the name of the remote host, such as Wget-r http://fly.srk.fer.hr/, will create a directory to fly.srk.fer.hr/the start of the directory, Use this option to cancel this implementation.


--protocol-directories Use this option to Wget-r http://host--protocol-directories, create a directory type such as http/host/locally.


-P,--directory-prefix= name creates a directory with the specified name before saving the file.


The number of--cut-dirs= ignores the specified number of directory tiers in the remote directory.


Example:


No Options-ftp.xemacs.org/pub/xemacs/

-NH-pub/xemacs/

-nh--cut-dirs=1-xemacs/

-nh--cut-dirs=2.


--cut-dirs=1-ftp.xemacs.org/xemacs/




HTTP options:


--default-page= FileName Select a file name to be the default, and when it encounters/ends, set the save data to a local file name.


-e--adjust-extension This option can be downloaded http://site.com/article.cgi?25 such URL is, download to local file name becomes Article.cgi?25.html,wget --html-extension before version 1.2 (all files with MIME-type text/html are added with the. html extension file name. )


--http-user= User Configuration HTTP user name.


--http-passwd= Password to configure the HTTP user password.


--no-http-keep-alive disables HTTP keep-alive (persistent connection).


--no-cache disables server-side caching, in which case wget sends instructions to the server when the file is downloaded Pragma:no-cache


--no-cookies Disabling cookies mechanism


The--load-cookies cookie file is loaded by the specified file before the session begins.


After the--save-cookies file session ends, the cookie is saved to the specified file.


--ignore-length ignores the "content-length" File header field.


The--header= string adds the specified string to the file header.

For example: wget--header= ' accept-charset:iso-8859-2 ' \

--header= ' accept-language:hr ' \

http://fly.srk.fer.hr/


--max-redirect= number of maximum redirects, default 20


--proxy-user= user configures the proxy server user name.


--proxy-passwd= Password to configure the proxy server user password.


The--referer=url contains the "Referer:url" header in the HTTP request.


-S,--save-headers the HTTP header into the file.


-U, the--user-agent=agent-string flag is an agent rather than a wget/version.


--post-data=string adding post submission information

Example:

# Log in to the server. This can is done with only once.

wget--save-cookies cookies.txt \

--post-data ' User=foo&password=bar ' \

http://server.com/auth.php


# now grab the page or pages we care about.

wget--load-cookies cookies.txt \

-P http://server.com/interesting/article.php

In this case, if the server uses session cookies to track user authentication information, the above example does not work because--save-cookies does not keep the session cookie, So Cookie.txt will be an empty file, if you want to save the session cookie to the file, it needs to be used in conjunction with--keep-session-cookies.


--post-file= file name to load content information for post submission from file




HTTPS (SSL) options:


--secure-protocol== Value Select SSL protocol, valid value is auto, SSLv2, SSLv3, TLSV1


--no-check-certificate server-side certificates are not checked


The--certificate= file is an optional client segment-side certificate.


--certificate-type= type Specifies the type of client certificate, the valid type is PEM (default), or DER (also called ASN1)


--private-key= file reads private key contents from a file


The--private-key-type= type sets the type of the private key, the valid type is PEM (default), or DER


--ca-certificate= file as a bundle Certificate Authority (CA) file, the type must be PEM, if not, wget will be in the system designated location to find the CA, this location in the installation of OpenSSL has been set up


--ca-directory= Directory location Specifies the location of the CA-certified directory in PEM format, and using--ca-directory is more efficient than--ca-certificate if there are multiple CA certifications


--random-file= file Select this file as a random data source to generate a pseudo-random number


--egd-file= file EGD socket filename. Egd=entropy Gathering Daemon




FTP options:


--ftp-user=user

--ftp-password=password user name and password to connect to the FTP server


--no-remove-listing does not delete the ". Listing" file on the wget side, this file contains the source directory list information received from the FTP server.


--no-glob sets the name of the file that identifies the wildcard character to be turned off. The default is open


--NO-PASSIVE-FTP does not use the "passive" transfer mode. The default is to use the passive mode


--retr-symlinks in recursive mode, download the file indicated by the link (with the exception of the directory).



Recursive Download:


-R,--recursive recursive download.


-L,--level= number maximum recursion depth (INF or 0 for Infinity).


--delete-after tells Wget to delete a single file for each download completed locally.


-K,--convert-links converts an absolute link to a relative link.


-K,--backup-converted it back to X.orig before converting the file X.


-M,--mirror equivalent to-r-n-l INF--no-remove-listing option.


-P,--page-requisites download all the files that are required to display the full page, such as the example.


--strict-comments Open the Strict (SGML) processing options for HTML comments.



Recursive download options for accept/reject:


-A, a list of file styles accepted by the--accept= list, separated by commas.


-R, the list of file styles excluded by the--reject= list, separated by commas.


-D, a comma-delimited list of fields accepted by the--domains= list.


List of domains excluded by the--exclude-domains= list, separated by commas.


--FOLLOW-FTP follows the FTP link in the HTML file.


--follow-tags= the HTML tags to follow for the list, separated by commas.


--ignore-tags= List of HTML tags to ignore, separated by commas.


--ignore-case ignoring case


-H,--span-hosts can enter other hosts when recursive.


-L,--relative only follow relative links.


-I,--include-directories= List of directories to download.


-X,--exclude-directories= List of directories to exclude.


-NP,--no-parent does not search the upper directory.



Exit Status:


0 No problem


1 Common error Codes


2 parsing errors


3 File I/O error


4 Network failure


5 SSL authentication failed


6 User name or Password authentication error


7 protocol error


8 Server side generates an error response



Related documents:


/ETC/WGETRC default wget global configuration initialization file


. WGETRC User Initialization files


This article from "Zheng Xiaoming Technology Blog" blog, declined reprint!

Linux wget using command parsing Daquan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.