Action: Enter the address with paging, remove the last number, set the start and end pages
function: Download all pages corresponding to the page number and save as HTML file, named at the current time
Code:
#-*-Coding:utf-8-*-#----------------------------# Program: Baidu Paste the small reptile # Date: 2015/03/28# language: Python 2.7# Action: Enter the address with paging, remove the last number, set the starting and ending pages # function: Download all pages corresponding to the page number and save as an HTML file, name the current time #----------------------------Import Urllib2import time def baidu_tieba (URL, start, end): For I in range (Start, end): sName = Time.strftime ('%y%m%d%h% m%s ') + str (i) + '. html ' print ' is downloading the section ' + str (i) + ' page and save it as ' + SName + ' ... ' f = open (SName, ' w+ ') m = Urlli B2.urlopen (Url+str (i)) n = m.read () f.write (n) f.close () print ' successful download ' Baiduurl = str (raw_ Input (' Please enter the address of the post, remove the number >>\n ' behind the pn ') begin_page = Int (raw_input (' Please enter the starting page number >>\n ') end_page = Int (Raw_input ( ' Please enter the end page number of the post >>\n ') Baidu_tieba (Baiduurl, Begin_page, End_page)
The above is the whole content of this article, I hope we can learn python to make a crawler to help.