1. Website Download:
wget is a command-line tool for file downloads that can download Web pages and remote files
Example: wget URL
wget https://zhidao.baidu.com/question/1818975931865141188.html
(1) You can specify to download from multiple URLs:
wget URL1 URL2 URL3 ...
(2) You can use wget to download files from the FTP server
(3) wget command parameter description:
-O: Specifies the output file name, and if there is a file with the same name, the file is emptied and the download file is written
-o: Specifies a log file that does not have to print log information to stdout
wget https://Zhidao.baidu.com/question/1818975931865141188.html-o Myweb.html-o weblog // Running the command terminal will not output anything
(3) due to an unstable internet connection, the download may be forced to be interrupted, the number of retries can be used as command parameters, so that once the download is interrupted, wget can make multiple attempts before discarding the download
WGET-T 5 URL
(4) Download speed limit: Limit the maximum bandwidth the download task can occupy
wget--limit-rate 20k http://Example.com/file.iso
(5) Specify the maximum download quota: Once the quota is exhausted, the download stops; avoid accidentally consuming too much disk space (specify download 100M here)
Wget-q 100m http://example.com/file1 http://example.com/file2
(6) The breakpoint continues to pass: The download of wget is interrupted before it is completed, and can be downloaded from the breakpoint using option-C
Wget-c URL
(7) Download with Curl: Curl does not write the download data to the file, it writes to the standard output, so we must redirect the output to the specified file with the redirect operator
Curl https://zhidao.baidu.com/question/1818975931865141188.html > Testweb.html
(8) Copy or mirror the entire site: wget has an option to recursively collect all the URL links on a Web page like a crawler, and download them individually
wget--mirror http://192.168.23.135/
Or, use the following method to mirror the entire Web site:
http://192.168.23.135/
-R: Recursive traversal of web pages
-N: Allow time stamp on file
-L: Specifies the page level, wget only down the specified page series
(9) Access an HTTP or FTP page that requires authentication:
--user and--password provide certification information
Pass ftp://192.168.23.21/
(10) Download Web pages in formatted plain text: Use the-dump option of the Lynx command to download the Web page as an ASCII character to a text file
Lynx-dump http://www.runoob.com/linux/linux-shell-io-redirections.html > Webpages_as_text.txt
This command will output all hyperlinks (<a href= "link" >) as text to the footer column under the references heading
the shell script; Web Automation