A Python Reptile applet

Source: Internet
Author: User

Cause

Late at night suddenly want to download a little ebook to expand the Kindle, just think of Python learning too shallow, what "decorator" Ah, "multi-threaded" Ah did not learn.

It is a classic and famous Python tutorial to think of the great God of Liao Xuefeng. Just want to find the wood has a pdf version of the download, the results did not find!! CSDN has an incomplete and cheated me out of a point!! Nima!!

Angry, ready to write a program directly to climb the Liu Xuefeng's tutorial, and then HTML into an ebook.

Process

The process is interesting, with a shallow python knowledge, a Python program, and a Python tutorial to learn about Python. Think of a little excitement ...

Sure enough, Python is very convenient, 50 rows or so OK. Direct Sticker Code:

#Coding:utf-8ImportUrllibdomain='http://www.liaoxuefeng.com'           #Liaoche's domain namePath = R'C:\Users\cyhhao2013\Desktop\temp\\'    #HTML to save the path#a header file for HTMLInput = open (r'C:\Users\cyhhao2013\Desktop\0.html','R') Head=Input.read ()#Open the Python tutorial main interfacef = Urllib.urlopen ("http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000") Home=F.read () f.close ( )#Replace all spaces carriage return (so easy to get URLs)Geturl = Home.replace ("\ n","") Geturl= Geturl.replace (" ","")#get the string containing the URLList = Geturl.split (r'em; " ><ahref= "') [1:]#Obsessive Compulsive Disorder, make sure you add the first page to perfection.List.insert (0,'/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000 ">')#start traversing the URL List forLiinchList:url= Li.split (r'">') [0] URL= domain + URL#Patchwork URLs    PrintURL F=urllib.urlopen (URL) HTML=F.read ()#get title in order to write file nametitle = Html.split ("<title>") [1] Title= Title.split ("-Liaoche's official website </title>") [0]#to turn the code, or add to the path is tragictitle = Title.decode ('Utf-8'). Replace ("/"," ")    #Intercept Texthtml = Html.split (r'<!--block main ---') [1] HTML= Html.split (r'') [0] HTML= Html.replace (r'src= "','src= "'+domain)#plus the head and tail make up the complete HTMLhtml = head + html+"</body>"    #Output FileOutput = open (path +"%d"% List.index (LI) + title +'. html','W') output.write (HTML) output.close ()

Life is short, I use Python!

At last

Attach HTML to epub ebook format online link: html.toepub.com

and Liaoche Tutorial: Link Liaoche git tutorial is also very very good oh ~

Add your own →_→ GitHub (the crawled HTML is on GitHub, too)

and personal blog: Http://blog.zhusun.in/cyhhao

Original from: A Python crawler applet

By:cyhhao http://blog.zhusun.in/cyhhao/

A Python Reptile applet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.