We can use python to implement such a simple crawler function and crawl the code we want locally. The following describes how to use python to implement such a function.
Cause
Late at night, I suddenly wanted to download some ebook to expand the kindle. I realized that python was too simple to learn. I didn't even learn any "decorators" or "multithreading.
Think of the python tutorial of Liao Xuefeng, Which is classic and famous. I just want to find a download of wood and pdf, but the result is not found !! An incomplete CSDN employee cheated me on a credit !! Nima !!
Angry, prepare to write a program to climb Liao Xuefeng's tutorial directly, and then convert html into an e-book.
Process
The process is very interesting. I use a superficial python knowledge to write python programs, crawl python tutorials, and learn python. A little excited ......
Sure enough, python is very convenient, and about 50 lines will be OK. Directly paste the Code:
# Coding: utf-8import urllibdomain = 'HTTP: // www.liaoxuefeng.com '# Liao Xuefeng domain name path = r'c: \ Users \ cyhhao2013 \ Desktop \ temp \ '# path to be saved in html # input = open (r 'C: \ Users \ cyhhao2013 \ Desktop \ 0.html ', 'R') head = input. read () # Open the main interface f = urllib of the python tutorial. urlopen ("http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000") home = f. read () f. close () # Replace all spaces and press enter (this makes it easy to get the url) geturl = home. replace ("\ n", "") geturl = geturl. replace ("", "") # obtain the string list containing the url = geturl. split (r'em; "> ') # Start to traverse the url Listfor li in list: url = li. split (R' "> ') [0] url = domain + url # splice url print url f = urllib. urlopen (url) html = f. read () # obtain the title to write the file name title = html. split ("") [1] title = title. split ("-liao Xuefeng's official website") [0] # re-enter the code, or add it to the path. The tragedy is title = title. decode ('utf-8 '). replace ("/", "") # truncate the body html = html. split (R'
') [1] html = html. split (R' your support is the greatest motivation for writing! ') [0] html = html. replace (r 'src = "', 'src ="' + domain) # Add the header and tail to form a complete html = head + html +""# Output file output = open (path +" % d "% list. index (li) + title + '.html ', 'w') output. write (html) output. close ()
It's just a short time. I use python!
The above is all the content of this article. I hope you will like it.