Linux can be used wget download the entire site, and the site link contains utf-8 encoded in Chinese can also be handled correctly.
The brief method is recorded as follows: wget--restrict-file-name=ascii-m-c-nv-np-k-e-p-r=exe,zip http://www.xxx.com parameters are explained as follows:--restrict-file-name =ascii, save the file name in ASCII format. This avoids the trouble of Utf-8 file name (Note: The 1.12 version only supports ASCII parameter values)-M full station download, mirror abbreviation, is-n-r-l INF--no-remove-listing These parameters shortcuts, specific details of the respective instructions-C Continuation-NV does not show detailed download details-np don ' t ascend to the parent directory. That is, the downloaded Web page does not go beyond the range of http://www.xxx.com specified later. Of course, if you specify HTTP://WWW.XXX.COM/AAA, all Web pages will be http://www.xxx.com/aaa under-K after the download is completed, the link in the page file is converted to a local link, easy to browse offline and make the CHM and so-E When saving the Html/css file, use the appropriate file suffix. For example, in some Web sites some files are dynamically generated on the server side, although it is a CSS file, but the suffix is not the CSS,-E option can be adjusted-P-NP the page file is limited, if not add-p, the HTML required media files will also be limited-np,-p download html/ CSS files required for all media files (Pictures, audio, video, etc.)-R to reject the downloaded file suffix list, comma delimited
As for files downloaded to the file name becomes a shape such as%a7, such as the percent plus 16 binary number form, you can use a Python program to change the file name: ———————————————————————————————————— import OS, urllib, sys , getopt
Class renamer:input_encoding = "output_encoding =" "Path =" "Is_url = False def __init__ (self, Input, output, path, is_url): self.input_encoding = input self.output_encoding = Output Self.path = Path Self.is_url = Is_url def start (self): Self.rename_dir (Self.path)
def rename (self, root, path): Try:if self.is_url:new = urllib.unquote (path). Decode (self . input_encoding). Encode (self.output_encoding) else:new = Path.decode (self.input_encoding). encod E (self.output_encoding) Os.rename (os.path.join (root, path), Os.path.join (root, New)) except: Pass
def rename_dir (self, path): To root, dirs, files in Os.walk (path): For F in files:self. Rename (root, f)
If dirs = []: for F in F iles: Self.rename (root, f) else: for D in dirs: &NBS P Self.rename_dir (Os.path.join (root, D)) Self.rename (Root, D) def usage (): print "This program can change encode of files or dir ectories. Usage: rename.py [option]... options: -H,--help &NB Sp this document. -I,--input-encoding=enc set original encoding, default is utf-8. -O,--output-encoding=enc set OUTP UT encoding, DEFAUlt is gbk. -P,--path=path Choose the path which to process. -U,--is-url whether as a url&n Bsp "
def main (argv): input_encoding = "Utf-8" output_encoding = "GBK" Path = "" &N Bsp Is_url = true try: opts, args = Getopt.getopt (argv, "Hi:o:p:u", ["Help", "input-encoding=", "output-encoding=", "path=", "Is-url"]) except getopt. getopterror: usage () Sys.exit (2) for OPT, ARG in opts:& nbsp If opt in ("-H", "--help"): usage () &NBS P Sys.exit () elif opt in ("-I", "--input-encoding"): &NB Sp input_encoding = arg elif opt in ("-O", "--output-encoding"): output_encoding = arg elif opt in ("-P", "--path"): & nbsp Path = arg &NBsp elif opt in ("-U", "--is-url"): Is_url = True
RN = Renamer (input_encoding, output_encoding, Path, Is_url) Rn.start ()
if __name__ = = ' __main__ ': Main (sys.argv[1:])
————————————————————————————————————rename.py-i utf-8-o gbk-p < specified download directory >-ufile renaming method from http://blog.csdn.net/kowity/article/details/6899256
Use wget to download the entire website under Linux