QQ空間Python爬蟲(3)---終章,python爬蟲

來源:互聯網
上載者:User

QQ空間Python爬蟲(3)---終章,python爬蟲

經測試上一節的代碼成功跑通,接下來加上迴圈爬取所有說說-。-

 

 

 

完整代碼:

 1 import requests 2 import json 3 import os 4 import shutil 5 import time 6  7 qq = 627911861 8  9 headers = {10     'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',11     'accept-encoding': 'gzip, deflate, br',12     'accept-language': 'zh-CN,zh;q=0.8',13     'cache-control': 'max-age=0',14     'cookie': 'xxxxxx',15     'upgrade-insecure-requests': '1',16     'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Mobile Safari/537.36'17 }18 19 url_x = 'https://mobile.qzone.qq.com/list?qzonetoken=9d29961d6fbb88be6236636010e0d4fde43a5b77d57ef984938b5aa0cb695e28c258a4d86b8c02a545bbcce970ff&g_tk=1573033187&res_attach=att%3D'20 url_y = '%26tl%3D1508257557&format=json&list_type=shuoshuo&action=0&res_uin=627911861&count=40'21 numbers = 0      # ‘查看更多’翻頁22 img_set = set()  # 存放圖片url集23 word_count = 0   # 文字說說計數器24 words = ""       # 存放文字說說25 images = ""      # 存放圖片url26 page = int(1761 / 40)27 28 29 for i in range(0, page):30     try:31         html = requests.get(url_x + str(numbers) + url_y, headers=headers).content32         data = json.loads(html)33         # print(data)34 35         for vFeed in data['data']['vFeeds']:36             if 'pic' in vFeed:37                 for pic in vFeed['pic']['picdata']['pic']:38                     img_set.add(pic['photourl']['0']['url'])39 40             if 'summary' in vFeed:41                 # print(str(word_count) + '. ' + vFeed['summary']['summary'])42                 words += str(word_count) + '. ' + vFeed['summary']['summary'] + '\r\n'43                 word_count += 144     except:45         print('error')46 47     numbers += 4048     time.sleep(10)49 50 try:51     with open(os.getcwd() + '\\' + str(qq) + '.txt', 'wb') as fo:52         fo.write(words.encode('utf-8'))53         print("文字說說寫入完畢")54 55     with open(os.getcwd() + '\\' + 'images_url', 'wb') as foImg:56         for imgUrl in img_set:57             images += imgUrl + '\r\n'58         foImg.write(images.encode('utf-8'))59         print("圖片寫入完畢")60 61 except:62     print('寫入資料出錯')63 64 65 if not img_set:66     print(u'不存在圖片說說')67 else:68     image_path = os.getcwd() + '\images'69     if os.path.exists(image_path) is False:70         os.mkdir(image_path)71     x = 172     for imgUrl in img_set:73         temp = image_path + '/%s.jpg' % x74         print(u'正在下載地%s張圖片' % x)75         try:76             r = requests.get(imgUrl, stream=True)77             if r.status_code == 200:78                 with open(temp, 'wb') as f:79                     r.raw.decode_content = True80                     shutil.copyfileobj(r.raw, f)81         except:82             print(u'該圖片下載失敗:%s' % imgUrl)83         x += 1

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.