Transferred from: http://blog.itpub.net/29867/viewspace-716088/
(Modify part of the content)
wget--restrict-file-name=ascii-m-c-nv-np-k-e-p http://www.w3school.com.cn/
wget--restrict-file-name=ascii-m-c-nv-np-k-e-p http://scrapy-chs.readthedocs.org
The parameters are interpreted as follows:
--restrict-file-name=ascii, save the file name in ASCII format. This avoids the hassle of utf-8 file names (Note: ASCII parameter values are supported in version 1.12)
-M entire station download, mirror abbreviation, is-n-r-l INF--no-remove-listing These several parameters shortcut, detailed read the respective explanation
-C onward Transfer
-NV does not show detailed download details
-NP don ' t ascend to the parent directory. That is, the downloaded Web page does not go beyond the scope of the http://www.xxx.com specified later. Of course, if you specify HTTP://WWW.XXX.COM/AAA, all Web pages will be under HTTP://WWW.XXX.COM/AAA
-K When the download is complete, the link in the paging file is converted to a local link, which makes it easy to browse and make the CHM offline.
-e When saving the Html/css file, use the appropriate file suffix. For example, in some websites some files are dynamically generated on the server side, although they are CSS files, but suffixes are not css,-e options can be adjusted
-P-NP limits the paging file, and if you do not add-p, the required media files for HTML will also be limited to-np,-p will download all the media files required for the Html/css file (Pictures, audio, video, etc.)
-r deny download file suffix list, comma delimited
As for files downloaded to the file name becomes a shape such as%a7, such as the percent plus 16 binary number form, you can use a Python program to change the file name:
————————————————————————————————————
Import OS, urllib, SYS, getopt
Class Renamer:
input_encoding = ""
output_encoding = ""
Path = ""
Is_url = False
def __init__ (self, input, output, path, Is_url):
self.input_encoding = input
self.output_encoding = output
Self.path = Path
Self.is_url = Is_url
def start (self):
Self.rename_dir (Self.path)
def rename (self, root, path):
Try
If Self.is_url:
New = Urllib.unquote (path). Decode (self.input_encoding). Encode (self.output_encoding)
Else
New = Path.decode (self.input_encoding). Encode (self.output_encoding)
Os.rename (Os.path.join (root, path), Os.path.join (root, new))
Except
Pass
def rename_dir (self, Path):
For root, dirs, files in Os.walk (path):
For f in Files:
Self.rename (Root, f)
if dirs = = []:
For f in Files:
Self.rename (Root, f)
Else
For D in dirs:
Self.rename_dir (Os.path.join (root, D))
Self.rename (Root, D)
def usage ():
Print "This" can change encode of files or directories.
Usage:rename.py [OPTION] ...
Options:
-H,--help this document.
-I,--input-encoding=enc set original encoding, default is UTF-8.
-O,--output-encoding=enc set output encoding, default is GBK.
-P,--path=path choose the path which to process.
-U,--is-url whether as a URL
‘‘‘
def main (argv):
input_encoding = "Utf-8"
output_encoding = "GBK"
Path = ""
Is_url = True
Try
opts, args = Getopt.getopt (argv, "hi:o:p:u", ["Help", "input-encoding=", "output-encoding=", "path=", "Is-url"])
Except Getopt. Getopterror:
Usage ()
Sys.exit (2)
For opt, Arg in opts:
If opt in ("-H", "--help"):
Usage ()
Sys.exit ()
Elif opt in ("-I", "--input-encoding"):
input_encoding = arg
Elif opt in ("-O", "--output-encoding"):
output_encoding = arg
Elif opt in ("-P", "--path"):
Path = arg
Elif opt in ("-U", "--is-url"):
Is_url = True
RN = Renamer (input_encoding, output_encoding, Path, Is_url)
Rn.start ()
if __name__ = = ' __main__ ':
Main (sys.argv[1:])
————————————————————————————————————
Rename.py-i utf-8-o gbk-p < specified download directory >-u
File renaming method from http://blog.csdn.net/kowity/article/details/6899256
wget How to download the entire website