Python採集網頁資料儲存到excel

來源:互聯網
上載者:User

urllib讀取網頁,然後用Py-excel寫excel。

import urllibfrom xlwt import Workbookimport datetimedef FetchData():    book = Workbook(encoding='gbk')    #如果採集資料有中文,需要添加這個     sheet1 = book.add_sheet('Sheet 2') #表格緩衝     i = 0    theday = datetime.date(2009,12,31)    while i < 100: #這邊的情境就是採集100個網頁,每個網址都包含日期         i += 1        theday = theday + datetime.timedelta(days = 1)        print theday        theday_str = str(theday)        sheet1.write(i,0,theday_str)  #寫表格         check_url = r'http://www.xxx.com/index?date=' + theday_str #網頁地址        try:            checkfile = urllib.urlopen(check_url)  #網頁儲存為文字檔         except Exception,e:            print e            return        type = sys.getfilesystemencoding()        for line in checkfile:            line = line.decode("UTF-8").encode(type)     #網頁編碼為UTF-8             date_west = getdata('date_west', line)       #擷取特定資料             if date_west != False:                sheet1.write(i,1,date_west)    book.save('simple.xls')  #儲存excel檔案     print 'finish!''if keywords in the line, get data from > to </'def getdata(keywords, line):    data = ''    if keywords in line:        start = line.find('>',)        end = line.find('</', start)        data = line[start+1:end]        return data    return False

 

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.