Python 爬蟲系列：糗事百科最熱段子

最後更新：2018-03-29 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：load soup target 分享圖片 tar lib 子類 find 寫代碼

1.擷取糗事百科url

http://www.qiushibaike.com/hot/page/2/ 末尾2指第2頁

2.分析頁面，找到段子部分的位置，需要一點CSS和HTML的知識

3、編寫代碼

 1 import urllib.request 2 from bs4 import BeautifulSoup 3 from urllib.request import URLError 4 from urllib.request import HTTPError 5 import time 6 # 調用 publicHeaders 檔案的方法 7 from 爬蟲.publicHeaders import set_user_agent 8  9 10 # 抓取網頁11 def download(pagenum):12     url = r‘https://www.qiushibaike.com/hot/page/‘13 14     # 分頁下載15     for i in range(1,pagenum):16         #組裝url17         new_url = url + str(pagenum)18         print(new_url)19         # 有的時候訪問某個網頁會一直得不到響應，程式就會卡到那裡，我讓他1秒後自動逾時而拋出異常20         header = set_user_agent()21         while 1:22             try:23                 req = urllib.request.Request(url=new_url,headers=header)24                 reponse = urllib.request.urlopen(req,timeout=1)25                 break26             # HTTPError是URLError的子類，在產生URLError時也會觸發產生HTTPError。因此應該先處理HTTPError27             except HTTPError as e:28                 print(e.code)29                 # 對於抓取到的異常，讓程式停止1.1秒，再迴圈重新訪問這個連結，訪問成功時退出迴圈30                 time.sleep(1.1)31             except URLError as err:32                 print(err.reason)33         # 正常訪問，則抓取網頁內容34         html = reponse.read().decode(‘utf-8‘)35         # 找到所有的class名稱為content 的div36         soup = BeautifulSoup(html,"html.parser")37         contents = soup.findAll("div",{"class":"content"})38         # # 迴圈遍曆儲存每一項,並儲存39         with open("E:\JustForFun.txt", "w") as f:40             for item in contents:41                 # 有些內容不是utf-8格式42                 try:43                     each_story = item.get_text()44                 #print(type(each_story))45                     f.writelines(each_story)46                 except:47                     pass

4、執行以下，結果如下：

Python 爬蟲系列：糗事百科最熱段子

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More