最近挺閑,沒事逛美空看美女,忽然覺得為什麼不把照片都下到本地,沒事的時候慢慢看呢,於是就有了以下代碼:
#-*- coding:utf-8 -*-
import urllib
import re
import os
#IMG_REG = re.compile('<img[^>]*?src[^>]*?=[\"\'][^"]*?[\'\"]')
IMG_REG = re.compile('<img[^>]*?src2=[\"\'][^"]*?[\'\"]')
URL_REG = re.compile('<a href="(.*?)" title="(.*?)" hidefocus="true" target="_blank">')
LOCAL_DIR = 'c://tmp/pictrue/'
def cbk(a, b, c):
per = 100 * a * b / c
if per > 100:
per = 100
print '%.2f%%' % per
def getPictrueFromOnePage(url, dirPath):
file = urllib.urlopen(url)
content = file.read()
for match in IMG_REG.findall(content):
print match
imgurl = match[match.index("http"):][:-1]
filename = imgurl[imgurl.rindex("/") + 1:]
print imgurl
print filename
local = dirPath + filename
urllib.urlretrieve(imgurl, local, cbk)
def mainPorcess(url):
content = urllib.urlopen(url).read()
i = 0
for matched in URL_REG.findall(content):
i = i + 1
subUrl = 'http://www.moko.cc' + matched[0]
print subUrl
path = LOCAL_DIR + matched[1].decode('utf-8').encode('gbk') + '\\'
if not os.path.isdir(path):
try:
os.mkdir(path)
except Exception as e:
path = LOCAL_DIR + str(i) + '\\'
print path
getPictrueFromOnePage(subUrl, path)
if __name__ == '__main__':
mainPorcess('http://www.moko.cc/channels/post/23/1.html')
它能自動下載照片並以美女的名字組建檔案夾來存貯照片。
這個程式有幾個缺陷:
1. 只能抓取美空頁面中按照美女姓名分類的二級目錄下的照片。
2. 只能抓取當前頁,不能自動翻頁
最後再說一句,python真是巨方便!!!