Download Audio novels with multiple threads in Python

Last Update:2013-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Experienced old birds (unmarried ones) will rent a house near the company, saving a lot of time while avoiding the trouble of boat riding; for some reason, you may need to walk between the company and your home every day. Unfortunately, you are one of them. Because my family and company are far away from each other, I usually spend 1/4 of my work time on buses. In addition, Hangzhou has always been called China's Las Vegas (congestion city ), every congestion, brother can imagine himself as a transformer. During this long period of time, I think it is intolerable for every programmer. However, since the status quo of survival cannot be changed for a short period of time, please take advantage of this time. Therefore, I bought the Note II on the big screen to watch pdf files, and my ears could not be idle. However, instead of listening to English, I listened to novels. I like to listen to broadcasts when I was reading books, in particular, I need a lot of audio novels because I have many resources on the Internet, but the downloading pages are difficult. In order to get more traffic and ad clicks, the download links of these websites must be opened on at least two webpages to find the real link, which is very troublesome. To save the overall download time, I wrote this applet, it is convenient for you to download audio novels. Of course, there are any other types of resources)

First, let me explain that I don't want to crawl a lot of information and data, but just for entertainment and learning, so I won't crawl all the links of a website without any aim, but give a novel, for example, if I want to download the novel "childhood", I will find the homepage of the novel on the website where I listen to the evaluation, and use a program to download all mp3 audio. For details, refer to the code below, all the code is in the module crawler5tps:

1. Set the start url and the directory for saving the file.

 
 
  
  #-*-coding:GBK-*-  
  
   import urllib,urllib2  
  
   import re,threading,os  
  
   baseurl = 'http://www.5tps.com' #base url   
  
   down2path = 'E:/enovel/'        #saving path  
  
   save2path = ''                  #saving file name (full path)

2. parse the url on the download page from the start url

 
 
  
  Def parseUrl (starturl ):
  
  '''''
  
  Parse out download page from start url.
  
  Eg. we can get 'HTTP: // www.5tps.com/down/8297_52_00001.html' from 'HTTP: // www.5tps.com/html/8297.html'
  
  '''
  
  Global save2path
  
  RDownloadUrl = re. compile (".*? <A href = \ '(/down/\ w + \. html) \'. * ") # find the link of download page
  
  # RTitle = re. compile ("<TITILE>. {4} \ s {1} (. *) \ s {1}. * </TITLE> ")
  
  # <TITLE> voice novels: full set of Liu Tao </TITLE>
  
  F = urllib2.urlopen (starturl)
  
  TotalLine = f. readlines ()
  
  
  
  '''''Create the name of saving file '''
  
  Title = totalLine [3]. split ("") [1]
  
  If OS. path. exists (down2path + title) is not True:
  
  OS. mkdir (down2path + title)
  
  Save2path = down2path + title + "/"
  
  
  
  DownUrlLine = [line for line in totalLine if rDownloadUrl. match (line)]
  
  DownLoadUrl = [];
  
  For dl in downUrlLine:
  
  While True:
  
  M = rDownloadUrl. match (dl)
  
  If not m:
  
  Break
  
  DownUrl = m. group (1)
  
  DownLoadUrl. append (downUrl. strip ())
  
  Dl = dl. replace (downUrl ,'')
  
  Return downLoadUrl

3. parse the real download link from the download page

 
 
  
  Def getDownlaodLink (starturl ):
  
  '''''
  
  Find out the real download link from download page.
  
  Eg. we can get the download link 'HTTP: // 180j-d.ysts8.com: 8000/documentary/childhood/001.mp3? \
  
  1251746750178x1356330062x1251747362932-3492f04cf54428055a110a176297d95a 'from \
  
  'Http: // www.5tps.com/down/8297_52_00001.html'
  
  '''
  
  DownUrl = []
  
  Gbk_ClickWord = 'click here to download'
  
  DownloadUrl = parseUrl (starturl)
  
  RDownUrl = re. compile ('<a href = \"(. *) \ "> <font color = \" blue \ "> '+ gbk_ClickWord + '. * </a> ') # find the real download link
  
  For url in downloadUrl:
  
  Realurl = baseurl + url
  
  Print realurl
  
  For line in urllib2.urlopen (realurl). readlines ():
  
  M = rDownUrl. match (line)
  
  If m:
  
  DownUrl. append (m. group (1 ))
  
  
  
  Return downUrl

4. Define the download function

 
 
  
  def download(url,filename):  
  
       ''''' download mp3 file ''' 
  
       print url  
  
       urllib.urlretrieve(url, filename)

5. Create a Thread class for downloading files

 
 
  
  class DownloadThread(threading.Thread):  
  
       ''''' dowanload thread class ''' 
  
       def __init__(self,func,savePath):  
  
           threading.Thread.__init__(self)  
  
           self.function = func  
  
           self.savePath = savePath  
  
         
  
       def run(self):  
  
           download(self.function,self.savePath)

6. Start download

 
 
  
  If _ name _ = '_ main __':
  
  Starturl = 'HTTP: // www.5tps.com/html/8297.html'
  
  DownUrl = getDownlaodLink (starturl)
  
  AliveThreadDict ={}# alive thread
  
  DownloadingUrlDict ={}# downloading link
  
  
  
  I = 0;
  
  While I <len (downUrl ):
  
  ''''' Note: I have heard that only three threads are allowed to download the same novel at the same time, but sometimes it is affected by the network ,\
  
  To ensure that the mp3 file is downloaded, set the number of threads to 2 '''
  
  While len (downloadingUrlDict) <2:
  
  DownloadingUrlDict [I] = I
  
  I + = 1
  
  For urlIndex in downloadingUrlDict. values ():
  
  # ArgsTuple = (downurlw.urlindex},save2path=str(urlindex1_1#'hangzhou ')
  
  If urlIndex not in aliveThreadDict. values ():
  
  T = downloadthread(downurl+urlindex},save2path+str(urlindex%1+'shanghai ')
  
  T. start ()
  
  AliveThreadDict [t] = urlIndex
  
  For (th, urlIndex) in aliveThreadDict. items ():
  
  If th. isAlive () is not True:
  
  Del aliveThreadDict [th] # delete the thread slot
  
  Del downloadingUrlDict [urlIndex] # delete the url from url list needed to download
  
  
  
  Print 'completed Download work'

In this way, let him enjoy it. I have to code other projects, Ah >>>

After the next copy to the Note class, you can listen to the novel while reading the materials, and finally attach the source code.

Link: http://www.cnblogs.com/wuren/archive/2012/12/24/2831100.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Download Audio novels with multiple threads in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Download Audio novels with multiple threads in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support