wget How to download the entire website

Source: Internet
Author: User

Transferred from: http://blog.itpub.net/29867/viewspace-716088/

(Modify part of the content)

wget--restrict-file-name=ascii-m-c-nv-np-k-e-p http://www.w3school.com.cn/
wget--restrict-file-name=ascii-m-c-nv-np-k-e-p http://scrapy-chs.readthedocs.org

The parameters are interpreted as follows:

--restrict-file-name=ascii, save the file name in ASCII format. This avoids the hassle of utf-8 file names (Note: ASCII parameter values are supported in version 1.12)

-M entire station download, mirror abbreviation, is-n-r-l INF--no-remove-listing These several parameters shortcut, detailed read the respective explanation

-C onward Transfer

-NV does not show detailed download details

-NP don ' t ascend to the parent directory. That is, the downloaded Web page does not go beyond the scope of the http://www.xxx.com specified later. Of course, if you specify HTTP://WWW.XXX.COM/AAA, all Web pages will be under HTTP://WWW.XXX.COM/AAA

-K When the download is complete, the link in the paging file is converted to a local link, which makes it easy to browse and make the CHM offline.

-e When saving the Html/css file, use the appropriate file suffix. For example, in some websites some files are dynamically generated on the server side, although they are CSS files, but suffixes are not css,-e options can be adjusted

-P-NP limits the paging file, and if you do not add-p, the required media files for HTML will also be limited to-np,-p will download all the media files required for the Html/css file (Pictures, audio, video, etc.)

-r deny download file suffix list, comma delimited

As for files downloaded to the file name becomes a shape such as%a7, such as the percent plus 16 binary number form, you can use a Python program to change the file name:

————————————————————————————————————

Import OS, urllib, SYS, getopt

Class Renamer:

input_encoding = ""

output_encoding = ""

Path = ""

Is_url = False

def __init__ (self, input, output, path, Is_url):

self.input_encoding = input

self.output_encoding = output

Self.path = Path

Self.is_url = Is_url

def start (self):

Self.rename_dir (Self.path)

def rename (self, root, path):

Try

If Self.is_url:

New = Urllib.unquote (path). Decode (self.input_encoding). Encode (self.output_encoding)

Else

New = Path.decode (self.input_encoding). Encode (self.output_encoding)

Os.rename (Os.path.join (root, path), Os.path.join (root, new))

Except

Pass

def rename_dir (self, Path):

For root, dirs, files in Os.walk (path):

For f in Files:

Self.rename (Root, f)

if dirs = = []:

For f in Files:

Self.rename (Root, f)

Else

For D in dirs:

Self.rename_dir (Os.path.join (root, D))

Self.rename (Root, D)

def usage ():

Print "This" can change encode of files or directories.

Usage:rename.py [OPTION] ...

Options:

-H,--help this document.

-I,--input-encoding=enc set original encoding, default is UTF-8.

-O,--output-encoding=enc set output encoding, default is GBK.

-P,--path=path choose the path which to process.

-U,--is-url whether as a URL

‘‘‘

def main (argv):

input_encoding = "Utf-8"

output_encoding = "GBK"

Path = ""

Is_url = True

Try

opts, args = Getopt.getopt (argv, "hi:o:p:u", ["Help", "input-encoding=", "output-encoding=", "path=", "Is-url"])

Except Getopt. Getopterror:

Usage ()

Sys.exit (2)

For opt, Arg in opts:

If opt in ("-H", "--help"):

Usage ()

Sys.exit ()

Elif opt in ("-I", "--input-encoding"):

input_encoding = arg

Elif opt in ("-O", "--output-encoding"):

output_encoding = arg

Elif opt in ("-P", "--path"):

Path = arg

Elif opt in ("-U", "--is-url"):

Is_url = True

RN = Renamer (input_encoding, output_encoding, Path, Is_url)

Rn.start ()

if __name__ = = ' __main__ ':

Main (sys.argv[1:])

————————————————————————————————————

Rename.py-i utf-8-o gbk-p < specified download directory >-u

File renaming method from http://blog.csdn.net/kowity/article/details/6899256

wget How to download the entire website

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.