Use wget to download the entire website under Linux

Source: Internet
Author: User

Linux can be used wget download the entire site, and the site link contains utf-8 encoded in Chinese can also be handled correctly.

The brief method is recorded as follows: wget--restrict-file-name=ascii-m-c-nv-np-k-e-p-r=exe,zip http://www.xxx.com parameters are explained as follows:--restrict-file-name =ascii, save the file name in ASCII format. This avoids the trouble of Utf-8 file name (Note: The 1.12 version only supports ASCII parameter values)-M full station download, mirror abbreviation, is-n-r-l INF--no-remove-listing These parameters shortcuts, specific details of the respective instructions-C Continuation-NV does not show detailed download details-np don ' t ascend to the parent directory. That is, the downloaded Web page does not go beyond the range of http://www.xxx.com specified later. Of course, if you specify HTTP://WWW.XXX.COM/AAA, all Web pages will be http://www.xxx.com/aaa under-K after the download is completed, the link in the page file is converted to a local link, easy to browse offline and make the CHM and so-E When saving the Html/css file, use the appropriate file suffix. For example, in some Web sites some files are dynamically generated on the server side, although it is a CSS file, but the suffix is not the CSS,-E option can be adjusted-P-NP the page file is limited, if not add-p, the HTML required media files will also be limited-np,-p download html/ CSS files required for all media files (Pictures, audio, video, etc.)-R to reject the downloaded file suffix list, comma delimited

As for files downloaded to the file name becomes a shape such as%a7, such as the percent plus 16 binary number form, you can use a Python program to change the file name: ———————————————————————————————————— import OS, urllib, sys , getopt
Class renamer:input_encoding = "output_encoding =" "Path =" "Is_url = False def __init__ (self, Input, output, path, is_url): self.input_encoding = input self.output_encoding = Output Self.path = Path Self.is_url = Is_url def start (self): Self.rename_dir (Self.path)
def rename (self, root, path): Try:if self.is_url:new = urllib.unquote (path). Decode (self . input_encoding). Encode (self.output_encoding) else:new = Path.decode (self.input_encoding). encod            E (self.output_encoding) Os.rename (os.path.join (root, path), Os.path.join (root, New)) except: Pass
def rename_dir (self, path): To root, dirs, files in Os.walk (path): For F in files:self. Rename (root, f)
            If dirs = []:                for F in F iles:                    Self.rename (root, f)             else:                for D in dirs:        &NBS P           Self.rename_dir (Os.path.join (root, D))                     Self.rename (Root, D) def usage ():    print "This program can change encode of files or dir ectories.    Usage:   rename.py [option]...    options:       -H,--help &NB Sp                this document.       -I,--input-encoding=enc    set original encoding, default is utf-8.       -O,--output-encoding=enc   set OUTP UT encoding, DEFAUlt is gbk.       -P,--path=path             Choose the path which to process.       -U,--is-url                whether as a url&n Bsp   "

def main (argv):    input_encoding = "Utf-8"     output_encoding = "GBK"     Path = ""   &N Bsp Is_url = true        try:        opts, args = Getopt.getopt (argv, "Hi:o:p:u", ["Help", "input-encoding=", "output-encoding=", "path=", "Is-url"])     except getopt. getopterror:        usage ()         Sys.exit (2)     for OPT, ARG in opts:& nbsp       If opt in ("-H", "--help"):            usage ()       &NBS P     Sys.exit ()         elif opt in ("-I", "--input-encoding"):        &NB Sp   input_encoding = arg        elif opt in ("-O", "--output-encoding"):            output_encoding = arg        elif opt in ("-P", "--path"):        & nbsp   Path = arg  &NBsp     elif opt in ("-U", "--is-url"):            Is_url = True
RN = Renamer (input_encoding, output_encoding, Path, Is_url) Rn.start ()
if __name__ = = ' __main__ ': Main (sys.argv[1:])
————————————————————————————————————rename.py-i utf-8-o gbk-p < specified download directory >-ufile renaming method from http://blog.csdn.net/kowity/article/details/6899256

Use wget to download the entire website under Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.