Whole website doc download wget (EXT)

Source: Internet
Author: User

wget is very powerful, if you want to put a series of documents on a website, usually an API document, you can use the following commands:
wget-E Robots=off-W 1-x-np-p-m-k-T1-x/upload/http://URLto make the options of this command line more explicit, it can also be written as:wget--execute Robots=off--wait=1--force-directories--no-parent--page-requisites--mirror--convert-links--tries=1--exclude/upload/http://websitesome options for using wget to copy a website are briefly described below, and are used for memos. '-e Command '--Execute command ' is used to execute additional. WGETRC commands. Just as the configuration of vim exists in the. vimrc file, Wget also uses the. wgetrc file to store its configuration. This means that the configuration commands in the. wgetrc file are executed before the wget executes. A typical. wgetrc file can be referenced by: The user can not overwrite the. wgetrc file in the case ofThe-E option specifies additional configuration commands. If you want to make multiple configuration commands,-e command1-e command2 ...-e CommandN can be. These configuration commands are executed after all commands in the. WGETRC, and therefore overwrite the same configuration items in the. Wgetrc. Here Robots=off is because wget by default will be based on the robots.txt of the site to operate, if Robots.txt is user-agent: * Disallow:/, wget is not able to mirror or download the directory, using the-e robots=off parameter to bypass the limit. -Wseconds--wait=seconds stop waiting for seconds time between page requests in order not to bring too much access pressure to the copied mirror site. -x--force-directories Create the directory structure that corresponds to the mirror site. http://Example/robots.txt When this file is downloaded, it waits for the corresponding www.example.com/robots.txt. The opposite is-nd, the--no-directories,-NP--no-The parent downloads only the files in the given directory, not the files in its parent directory, even if there is a connection to the files in the parent directory on some pages. This is more necessary, if not qualified, would have only wanted to download www.example.com/blog/blog post in the end, it is possible to download the entire www.example.com. -P--page-Requisites Downloads all the resources needed to display the entire page, including the embedded image and CSS style files. -m--Mirror It opens the image-related options, such as a subdirectory of infinite depth recursively downloaded. -k--convert-Links This option will fix the links between HTML, CSS, image and other resources after the entire website is downloaded, so that they all point to the downloaded local files, which is suitable for local browsing. -T times--tries=times If a resource fails to download, this option specifies the number of retries to download: wgetrc default is 20 times. When we download the website, we can make it smaller, a less download time, and two reduce the pressure on the mirror site. -x/some/dir--exclude/some/dirYou can use this parameter to specify a directory that you do not need to download, and if you have multiple directories to exclude, you can separate them with commas, such as-x/some/dir1;/some/dir2

Whole website doc download wget (EXT)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.