Python Writing crawler Applet

Source: Internet
Author: User
Tags html header
causes

Late at night suddenly want to download a little ebook to expand the Kindle, just think of Python learning too shallow, what "decorator" Ah, "multi-threaded" Ah did not learn.
It is a classic and famous Python tutorial to think of the great God of Liao Xuefeng. Just want to find the wood has a pdf version of the download, the results did not find!! CSDN has an incomplete and cheated me out of a point!! Nima!!
Angry, ready to write a program directly to climb the Liu Xuefeng's tutorial, and then HTML into an ebook.

Process

The process is interesting, with a shallow python knowledge, a Python program, and a Python tutorial to learn about Python. Think of a little excitement ...
Sure enough, Python is very convenient, 50 rows or so OK. Direct Sticker Code:

# Coding:utf-8import Urllibdomain = ' http://www.liaoxuefeng.com ' #廖雪峰的域名path = R ' C:\Users\cyhhao2013\Desktop\temp\\ ' #html要保存的路径 # An HTML header file input = open (R ' C:\Users\cyhhao2013\Desktop\0.html ', ' r ') head = Input.read () # Open the Python tutorial main interface f = Urllib.urlopen ("http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000") Home = F.read () F.close () # Replace all spaces carriage return (so easy to get URL) Geturl = Home.replace ("\ n", "") Geturl = Geturl.replace ("", "") # Gets the string containing the URL list = Geturl.split (R ' em; " > ') # Start traversing URLs listfor li in list:url = Li.split (R ' "> ') [0] url = domain + URL #拼凑url print url f = urllib.u Rlopen (URL) html = f.read () # Gets the title in order to write the filename title = Html.split ("<title>") [1] title = Title.split ("-Liaoche's official website</title>[0] # To turn the code, or add to the path is tragic title = Title.decode (' utf-8 '). Replace ("/", "") # truncate BODY HTML = Html.split (R ')
 ') [1] html = html.split (R '

Your support is the author of the most powerful power to write!

') [0] html = html.replace (R ' src= "', ' src=" ' + domain) # Plus head and tail make up the complete HTML HTML = head + html+ ""# Output File = open (path +"%d "% List.index (LI) + title + '. html ', ' W ') output.write (HTML) output.close ()

Life is short, I use Python!

The above mentioned is the whole content of this article, I hope you can like.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.