Python爬取三國演義的實現方法,python三國演義

來源:互聯網
上載者:User

Python爬取三國演義的實現方法,python三國演義

本文的爬蟲教程分為四部:

     1.從哪爬 where

     2.爬什麼 what

     3.怎麼爬 how

     4.爬了之後資訊如何儲存 save

一、從哪爬

三國演義

二、爬什麼

三國演義全文

三、怎麼爬

在Chrome頁面開啟F12,就可以發現文章內容在節點

<div id="con" class="bookyuanjiao">

只要找到這個節點,然後把內容寫入到一個html檔案即可。

content = soup.find("div", {"class": "bookyuanjiao", "id": "con"})

四、爬了之後如何儲存

主要就是拿到內容,拼接到一個html檔案,然後儲存下來就可以了。

#!usr/bin/env # -*-coding:utf-8 -*-import urllib2import osfrom bs4 import BeautifulSoup as BSimport localeimport sysfrom lxml import etreeimport rereload(sys)sys.setdefaultencoding('gbk')sub_folder = os.path.join(os.getcwd(), "sanguoyanyi")if not os.path.exists(sub_folder):  os.mkdir(sub_folder)path = sub_folder# customize html as head of the articlesinput = open(r'0.html', 'r')head = input.read()domain = 'http://www.shicimingju.com/book/sanguoyanyi.html't = domain.find(r'.html')new_domain = '/'.join(domain.split("/")[:-2])first_chapter_url = domain[:t] + "/" + str(1) + '.html'print first_chapter_url# Get url if chapter listsreq = urllib2.Request(url=domain)resp = urllib2.urlopen(req)html = resp.read()soup = BS(html, 'lxml')chapter_list = soup.find("div", {"class": "bookyuanjiao", "id": "mulu"})sel = etree.HTML(str(chapter_list))result = sel.xpath('//li/a/@href')for each_link in result:  each_chapter_link = new_domain + "/" + each_link  print each_chapter_link  req = urllib2.Request(url=each_chapter_link)  resp = urllib2.urlopen(req)  html = resp.read()  soup = BS(html, 'lxml')  content = soup.find("div", {"class": "bookyuanjiao", "id": "con"})  title = soup.title.text  title = title.split(u'_《三國演義》_詩詞名句網')[0]  html = str(content)  html = head + html + "</body></html>"  filename = path + "\\" + title + ".html"  print filename  # write file  output = open(filename, 'w')  output.write(html)  output.close()

0.html的內容如下

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>

總結

以上就是利用Python爬取三國演義的實現方法,希望對大家學習python能有所協助,如果有疑問大家可以留言交流。

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.