python爬去妹子網整個圖片資源教程（最詳細版）

最後更新：2018-03-29 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：python爬蟲；

爬取妹子網的低級教程串連如下：[爬妹子網](68958267)

ps：只支援單個套圖下載，不支援整體下載

在說說我的這個爬蟲代碼的設計思路：

①當我們瀏覽這個網站時，會發現，每一個頁面的URL都是以網站的網域名稱+page+頁數組成，這樣我們就可以逐一的訪問該網站的網頁了

②當我們看圖片列表時中，把滑鼠放到圖片，右擊檢查，我們發現，圖片的內容由ul包裹的li組成，箭頭所指的地方為每個套圖的地址，這樣我們就可以進入套圖，一個這樣的頁麵包含有24個這樣的套圖，我們用BeautifulSoup，處理。

③我們進入套圖，滑鼠放到40處，右擊，發現該套圖圖片的最大圖片數為第十個span的值，而且每個套圖的url同①原理相同為套圖的url+第幾張圖片（如3為第二張圖片），最後下載的url由一個class為main-titleDIV組成。提取img標籤的src屬性即可獲得下載連結

from bs4 import BeautifulSoup

import requests

import os

base_url='http://www.mzitu.com/page/'

header = { 'Referer':'http://www.mzitu.com'}

#反‘反盜鏈’

for x in range(11,20):

#盜取第十一頁到19頁圖片

html_a=requests.get(base_url+str(x),headers=header)

soup_a=BeautifulSoup(html_a.text,features='lxml')

#解析第一個網頁

pages=soup_a.find('ul',{'id':'pins'}).find_all('a')

#選出a標籤，如第二步的箭頭所指的地方

b=1

for y in pages:

if(b%2!=0):

#因為一個li標籤裡面有兩個a標籤，所以要去除重複

html=requests.get(y['href'],headers=header)

soup_b=BeautifulSoup(html.text,features='lxml')

#進入套圖，解析套圖

pic_max=soup_b.find_all('span')[10].text

#選出該套圖的最大圖片數

tittle=soup_b.find('h2',{'class':'main-title'}).text

os.makedirs('./img/'+str(tittle))

#製造一個目錄

for i in range(1,int(pic_max)+1):

#迴圈，下載套圖圖片，

href=y['href']+'/'+str(i)

html2=requests.get(href,headers=header)

soup2=BeautifulSoup(html2.text,features='lxml')

pic_url=soup2.find('img',alt=tittle)

html_name=requests.get(pic_url['src'],headers=header,stream=True)

file_name=pic_url['src'].split(r'/')[-1]

with open('./img/'+str(tittle)+'/'+file_name,'wb') as f:

#按32位元組下載

for x in html_name.iter_content(chunk_size=32):

f.write(x)

b=b+1

print（'ok'）

from bs4 import BeautifulSoupimport requestsimport osbase_url='http://www.mzitu.com/page/'header = { 'Referer':'http://www.mzitu.com'}for x in range(13,20):    html_a=requests.get(base_url+str(x),headers=header)    soup_a=BeautifulSoup(html_a.text,features='lxml')    pages=soup_a.find('ul',{'id':'pins'}).find_all('a')    b=1    for y in pages:        if(b%2!=0):            html=requests.get(y['href'],headers=header)            soup_b=BeautifulSoup(html.text,features='lxml')            pic_max=soup_b.find_all('span')[10].text            tittle=soup_b.find('h2',{'class':'main-title'}).text            os.makedirs('./img/'+str(tittle))            for i in range(1,int(pic_max)+1):                href=y['href']+'/'+str(i)                html2=requests.get(href,headers=header)                soup2=BeautifulSoup(html2.text,features='lxml')                pic_url=soup2.find('img',alt=tittle)                html_name=requests.get(pic_url['src'],headers=header,stream=True)                file_name=pic_url['src'].split(r'/')[-1]                                with open('./img/'+str(tittle)+'/'+file_name,'wb') as f:                    for x in html_name.iter_content(chunk_size=32):                        f.write(x)                                b=b+1print('ok')

以上代碼為原創代碼，

爬取結果

python爬去妹子網整個圖片資源教程（最詳細版）

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More