Use of wget

Source: Internet
Author: User
Tags ftp site

1. Capture simple pages

Basic wget usage

Wget http://domain.com/path/simple_page.html

2. add your own head

Some websites or pages require additional authentication, so you need to add additional HTTP headers. Usage:

Wget -- header = "myheader: head_value" http://domain.com/path/page/need_header.php

3. disguise as a browser

Some websites, such as Facebook, will check whether the request method is a browser. If it is not a normal browser, it will redirect to an "incompatible Browser" error page. At this time, wget needs to be disguised as a browser (I'm Mozilla Firefox !) :

Wget -- User-Agent = "Mozilla/5.0 (windows; U; Windows NT 5.1; en-US; RV: 1.9.2.3) Gecko/20100401 Firefox/3.6.3 (. net CLR 3.5.30729) "http://domain.com/path/page/check_user_agent.php

4. Post data to a page

Wget can not only request webpages in get mode, but also post data, so that automatic registration and automatic logon can be realized (except for pages with verification codes ....)

Wget -- post-Data = "user = user1 & pass = pass1 & submit = login" http://domain.com/login.php

5. Access the page to be logged on

Some pages need to be accessed, and the Cookie needs to be passed during access. At this time, it needs to be combined with the POST method mentioned above. The general process is: post the user name and password to log on, save the cookie, and then access the page with the cookie.

Wget -- post-Data = "user = user1 & pass = pass1 & submit = login" --save-cookies‑cookie.txt -- keep-session-cookieshttp: // domain.com/login.php

Wget --load-cookies‑cookie.txt http://domain.com/path/page_need_login.php

Wget tips:

1> download all files in the packs directory on the http://www.linux.com website
$ Wget-r-NP-nd http://www.linux.com/packs/

-NP does not traverse parent directories.
-Nd indicates that the directory structure is not re-created on the local machine.

2> download the entire HTTP or FTP site

$ Wget-r-x http://www.linux.com

-X enforces the creation of identical directories on the server.

This command recursively downloads all directories and files on the server, that is, downloading the entire website. During download, all the addresses directed to by the downloaded website will be downloaded,If this website references other websites, the referenced websites will also be downloaded !!!

Note: You can use the-l number parameter to specify the download level. For example, to download only two layers, use-L 2.
Example: wget-r-X-L 2 http://www.linux.com

3> wget selectively downloads only certain types of files

$ Wget-r-NP-Nd-accept = ISO http://www.linux.com/i386/

-Accept = ISO option, which indicates that wget only downloads all files with the ISO extension in the i386 directory. You can also specify multiple extensions, which can be separated by commas.

4> batch download
Wget-I downloads.txt

If multiple files are downloaded, put the address of all downloaded files to downloads.txt (write a line for the URL of each file), and then wget will automatically download all the files for you.

5> resumable upload

$ Wget-c-t 100-T 120 http://www.linux.com/big-file.iso

When the file size is very large or the network speed is very slow, the connection is often cut off before the file is downloaded. In this case, resumable data transfer is required. The resumable upload of wget is automatic.

-The C option is used for resumable upload.
The-t parameter indicates the number of retries (for example, if you need to retry 100 times, write-t 100. If it is set to-T 0, it indicates an infinite number of retries until the connection is successful .)
The-t parameter indicates the timeout wait time, for example,-T 120, indicating that a timeout occurs even if the connection fails for 120 seconds.

6> image a website
$ Wget-m-K (-h) http://www.linux.com/

Wget is an open-source software developed in Linux, written by Hrvoje Niksic, and subsequently transplanted to various platforms including windows. It has the following functions and features:

(1) Support for resumable data transfer. This is also the biggest selling point of the year for network ant financial and flashget. Now, wget can also use this function. users who are not good at the network can rest assured;
(2) FTP and HTTP download methods are supported at the same time. Although most software can be downloaded through HTTP, FTP download is still required in some cases;
(3) Support for proxy servers. For systems with high security, generally, their systems are not directly exposed on the Internet. Therefore, support for proxy is a required function for downloading software;
(4) easy to set; maybe, users who are used to the graphic interface are not too familiar with command line. However, the command line has more advantages in setting, at least, the mouse can be clicked many times, and do not worry if the mouse is wrong;
(5) The program is small and completely free. The program is small and negligible, because the hard disk is too large now. If it is completely free, you have to consider it. Even if there are many so-called free software on the network, however, advertisements for these software are not what we like;

Although wget is powerful, it is relatively simple to use. The basic syntax is: wget [parameter list] URL. The following uses a specific example to describe how to use wget.

Note: wget URL downloads the file to the current directory. If the. wgetrc file is configured with a proxy, the proxy is opened by default.
1. Download the entire HTTP or FTP site.
Wget http://place.your.url/here
This command can download the http://place.your.url/here home page. Using-X will force the creation of identical directories on the server. If the-nd parameter is used, all downloaded content on the server will be added to the local directory.

Wget-r http://place.your.url/here

This command uses recursive methods to download all directories and files on the server. The essence is to download the entire website. This command must be used with caution, because during download, all the addresses pointed to by the downloaded website will also be downloaded. Therefore, if this website references other websites, the referenced website will also be downloaded! For this reason, this parameter is not commonly used.
You can use the-l number parameter to specify the download level. For example, to download only two layers, use-L 2.

If you want to make an image site, you can use the-M parameter, for example, wget-M http://place.your.url/here
At this time, wget will automatically determine the appropriate parameters to create an image site. Then, wgetwill be uploaded to the server and read to robots.txtand executed according to robots.txt.

2. resumable upload.
When the file size is very large or the network speed is very slow, the connection is often cut off before the file is downloaded. In this case, resumable data transfer is required. The resumable upload of wget is automatic. You only need to use the-C parameter, for example:
Http://the.url.of/incomplete/file wget-C
Resumable data transfer requires the server to support resumable data transfer. The-t parameter indicates the number of retries. For example, if you need to retry 100 times, write-t 100. If it is set to-T 0, it indicates an infinite number of retries until the connection is successful. The-t parameter indicates the timeout wait time, for example,-T 120, indicating that a timeout occurs even if the connection fails for 120 seconds.

3. Batch download.
If multiple files are downloaded, you can generate a file, write the URL of each file in a line, such as the generated file download.txt, and then run the command wget-I download.txt.
This will download all the URLs listed in download.txt. (If the column is a file, download the file. If the column is a website, download the homepage)

4. Selective download.
You can specify that wget only downloads one type of files, or does not download any files. For example:
Wget-M -- reject = GIF http://target.web.site/subdirectory
Download http://target.web.site/subdirectory, but the GIF file is omitted. -- Accept = list acceptable file types, -- reject = List reject accepted file types.

5. Password and authentication.
Wget can only process websites restricted by user name/password. Two parameters can be used:
-- Http-user = User: Set the HTTP user
-- Http-passwd = pass: Set the HTTP Password
Websites that require certificate authentication can only use other download tools, such as curl.

6. Use the proxy server for download.
If your network needs to go through the proxy server, you can have wget download files through the proxy server. Create a. wgetrc file in the current user directory. You can set the proxy server in the file:
HTTP-proxy = 111.111.111.111: 8080
FTP-proxy = 111.111.111.111: 8080
Indicates the HTTP Proxy server and the FTP Proxy Server respectively. If the proxy server requires a password, use:
-- Proxy-user = User: Set proxy user
-- Proxy-passwd = pass: sets the proxy password.
These two parameters.
Use the -- proxy = on/off parameter to use or disable the proxy.
Wget also has many useful functions that need to be mined by users.

Appendix:

Command Format:
Wget [parameter list] [target software and web site]

-V, -- version: displays the software version number and exits;
-H, -- help: displays the software help information;
-E, -- execute = command to execute a ". wgetrc" command

-O, -- output-file = file: Save the software output information to the file;
-A, -- append-output = file: append the software output information to the file;
-D, -- debug displays the output information;
-Q, -- Quiet does not display output information;
-I, -- input-file = file: Get the URL from the file;

-T, -- tries = number indicates the number of downloads (0 indicates infinite times)
-O -- output-document = file: The downloaded file is saved as another file name.
-NC, -- no-clobber do not overwrite existing files
-N, -- timestamping only downloads new files than local
-T, -- timeout = seconds
-Y, -- proxy = On/Off disable proxy

-Nd, -- no-directories do not create a directory
-X, -- force-directories force Directory Creation

-- Http-user = User: Set the HTTP user
-- Http-passwd = pass: Set the HTTP Password
-- Proxy-user = User: Set proxy user
-- Proxy-passwd = pass: sets the proxy password.

-R, -- Recursive download the entire website and directory (use it with caution)
-L, -- level = Number download level

-A, -- accept = list acceptable file types
-R, -- reject = file type rejected by list
-D, -- domains = list acceptable domain names
-- Exclude-domains = List rejected Domain Name
-L, -- relative download link
-- Follow-FTP: only download the FTP Link
-H, -- span-hosts can download external hosts
-I, -- include-directories = List Directory
-X, -- exclude-directories = List reject directory

The Chinese document name is usually encoded, but it is normal in -- Cut-dirs,
Wget-r-NP-NH -- Cut-dirs = 3 ftp: // host/test/
Test. txt
Wget-r-NP-NH-nd ftp: // host/test/
When b4%fa%b8%d5.txt
Wget "ftp: // host/test /*"
When b4%fa%b8%d5.txt

The unknown reason may be that wget automatically uses encode_string to process the part of the captured file name to avoid special file names, therefore, the patch will be processed as "% 3A" by encode_string and restored to ":" using decode_string, and applied to the directory and file name, decode_string is the built-in function of wget.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.