擷取羅輯思維每天的語音(python),羅輯python
擷取羅輯思維每天的語音(python)by 伍雪穎
一直喜歡聽羅輯思維,不過每天去點那個的公眾號好麻煩,而且每天聽一分鐘也不太爽,於是今天突然想把它們全pa下來,合并起來,找時間一次全聽完不更好.於是,開動:上網找了下,發現已經有一個網站有mp3,於是去爬它的資料(也可以直接去爬羅輯思維官網的資料http://www.ljsw.cc)http://www.ljsw.cc/forum-39-1.html擷取title和mp3的url:#coding=utf-8
import re,urllib2
f = file('luoji.txt','w')
def getHtmlCode(url):
return urllib2.urlopen(url).read()
def getTitle(htmlString):
regTitle = re.compile("xst\">(.+?) ")
returnregTitle.findall(htmlString)
def getMp3Url(htmlString):
regMp3 = re.compile("http(.+?).mp3\'")
returnregMp3.findall(htmlString)
def getLuojiContent(url):
htmlCode = getHtmlCode(url)
titles = getTitle(htmlCode)
urls = getUrl(htmlCode)
for i in range(0,len(urls)):
printtitles[i]
f.write(titles[i] + '-')
contentHtml = getHtmlCode(urls[i])
contents = getMp3Url(contentHtml)
iflen(contents) > 0:
mp3Url = 'http' + contents[0] +'.mp3'
print mp3Url
f.write(mp3Url + '\n')
if __name__ == '__main__':
for i in range(1,38):
printstr(i)
url = 'http://www.ljsw.cc/forum-39-'+ str(i) + '.html'
try:
getLuojiContent(url)
print'finished: ' + str(i)
except:
printstr(i) + ': error!'
下載mp3檔案:#coding=utf-8
import re,urllib2,os
for line in open("luoji.txt"):
contents = line.split('-')
url = line[11:len(line)-1]
cmd = 'curl -O "%s"'% (url)
os.system(cmd)
fileName = url.split('/')
name = fileName[len(fileName) - 1]
os.rename(name,contents[0] +'.mp3')
合并mp3檔案:from glob import iglob
import shutil
import os
PATH = r'mp3'
destination = open('luoji.mp3','wb')
for filename in iglob(os.path.join(PATH, '*.mp3')):
shutil.copyfileobj(open(filename, 'rb'), destination)destination.close()
搞定,於是可以一個個聽,也可以合起來聽
所有python代碼:github代碼
所有mp3檔案:連結: http://pan.baidu.com/s/1nt5L7Pf 密碼: 5mrg