Python爬蟲之豆瓣-新書速遞-圖書解析

來源:互聯網
上載者:User

標籤:

1- 問題描述

  抓取豆瓣“新書速遞”[1]頁面書資訊(包括書名,作者,簡介,url),將結果重新導向到txt文字檔下。

2- 思路分析[2]

  Step1 讀取HTML

  Step2 Xpath遍曆元素和屬性

 

3- 使用工具

  Python,lxml模組,requests模組

 

4- 程式實現

 

 1 # -*- coding: utf-8 -*- 2 from lxml import html 3 import requests 4  5  6 page = requests.get(‘http://book.douban.com/latest?icn=index-latestbook-all‘) 7 tree = html.fromstring(page.text) 8  9 # 若儲存了html檔案,可使用下面方法10 # page = open(‘/home/freyr/codeHouse/python/512.htm‘, ‘r‘).read()11 # tree = html.fromstring(page)12 13 #提取圖書資訊14 bookname = tree.xpath(‘//div[@class="detail-frame"]/h2/text()‘)    # 書名15 author = tree.xpath(‘//div[@class="detail-frame"]/p[@class="color-gray"]/text()‘)    # 作者16 info = tree.xpath(‘//div[@class="detail-frame"]/p[2]/text()‘)    # 簡介17 url = tree.xpath(‘//ul[@class="cover-col-4 clearfix"]/li/a[@href]‘)    # URL18 19 booknames = map(lambda x:x.strip(), bookname)20 authors = map(lambda x:x.strip(), author)21 infos = map(lambda x:x.strip(), info)22 urls = map(lambda p: p.values()[0], url)23 24 with open(‘/home/freyr/codeHouse/python/dbBook.txt‘,‘w+‘) as f:25     for book, author, info, url in zip(booknames, authors, infos, urls):26         f.write(‘%s\n\n%s\n\n%s‘ % (book.encode(‘utf-8‘), author.encode(‘utf-8‘), info.encode(‘utf-8‘)))    27         f.write(‘\n\n%s\n‘ % url )28         f.write(‘\n\n-----------------------------------------\n\n\n‘)

PS:   1.還沒有真正入手學習網頁爬蟲,先簡單記錄下。

    2.程式涉及編碼問題[3]

[1] 豆瓣-新書速遞

[2] lxml and Requests

[3] lxml 中文亂碼 

Python爬蟲之豆瓣-新書速遞-圖書解析

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.