[Python] web crawler (6): A simple example code of Baidu Post bar crawlers. For more information, see.
[Python] web crawler (6): a simple web crawler
#-*-Coding: UTF-8-*-# ------------------------------------- # Program: Baidu pub crawler # Version: 0.1 # Author: why # Date: 2013-05-14 # Language: Python 2.7 # operation: enter the address with pagination, remove the last number, and set the start and end pages. # Function: Download all pages on the corresponding page and store them as html files. # Define import string, urllib2 # define Baidu function def baidu_tieba (url, begin_page, end_page): for I in range (begin_page, end_page + 1): sName = string. zfill (I, 5) + '.html '# automatically fill in the six-digit file name print 'download' + str (I) +, and store it as '+ sName + '...... 'F = open (sName, 'W + ') m = urllib2.urlopen (url + str (I )). read () f. write (m) f. close () # -------- enter the parameter ------------------ # This is the address of a post in Baidu Post bar of Shandong University # bdurl =' http://tieba.baidu.com/p/2296017831?pn= '# IPostBegin = 1 # iPostEnd = 10 bdurl = str (raw_input (u' enter the address of the clipboard and remove the number "\ n" after pn = ')) begin_page = int (raw_input (u' enter the start page number: \ n') end_page = int (raw_input (u' enter the end page number: \ n ')) # -------- enter the parameter ---------------- # Call baidu_tieba (bdurl, begin_page, end_page)
The above is the [Python] web crawler (6): the content of a simple web crawler of Baidu Post Bar. For more information, see The PHP Chinese website (www.php1.cn )!