Python 爬蟲 (五)

來源:互聯網
上載者:User

標籤:word   continue   htm   like   port   except   apple   exists   sts   

 # 頭條街拍圖片爬取


1 import re 2 import requests 3 from urllib import request 4 import json 5 import os 6 i = 0 7 headers = { 8 ‘user-agent‘: ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36‘ 9 }10 while True:11 pag_all_url = ‘https://www.toutiao.com/search_content/?offset={}&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=1&from=search_tab‘12 i += 2013 full_pag_url = pag_all_url.format(i)14 # print(full_pag_url) # 街拍的首頁 根據不同的i來請求ajax,從而獲得所有的街拍網址 像這樣https://www.toutiao.com/search/?keyword=%E8%A1%97%E6%8B%8D15 pag_html = requests.get(full_pag_url,headers = headers).text16 pag_html_str = str(json.loads(pag_html))17 # print(pag_html_str) #把網頁轉化為字串 進行正則匹配18 img_pag_id = re.findall(r‘\‘item_source_url\‘: \‘\/group\/(\d*)\/\‘,‘,pag_html_str)19 # print(img_pag_id) #獲得每個街拍的url like this--->https://www.toutiao.com/a6590127156037157379/20 for l in img_pag_id: #圖片下載21 img_all_url = ‘https://www.toutiao.com/a{}‘22 full_url = img_all_url.format(l)23 # print(full_url)#圖片的url print(full_pag_url)#圖片所在的url24 html = requests.get(full_url,headers=headers).text25 pattern = r‘gallery: JSON\.parse\((.*)\),‘26 ans1 = re.search(pattern,html)27 try:28 ans1_str = json.loads(ans1[1])29 ans1_dic = json.loads(ans1_str)30 # print(ans1_dic)31 # if not os.path.exists(‘1‘):32 # os.mkdir(‘1‘)33 for q in ans1_dic[‘sub_images‘]:34 img_url = q[‘url‘]35 print(img_url)36 filename = ‘1/‘ + img_url.split(‘/‘)[-1] + ‘.jpg‘37 request.urlretrieve(img_url, filename)38 except:continue

 

Python 爬蟲 (五)

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.