Download Audio novels with multiple threads in Python

Source: Internet
Author: User

Experienced old birds (unmarried ones) will rent a house near the company, saving a lot of time while avoiding the trouble of boat riding; for some reason, you may need to walk between the company and your home every day. Unfortunately, you are one of them. Because my family and company are far away from each other, I usually spend 1/4 of my work time on buses. In addition, Hangzhou has always been called China's Las Vegas (congestion city ), every congestion, brother can imagine himself as a transformer. During this long period of time, I think it is intolerable for every programmer. However, since the status quo of survival cannot be changed for a short period of time, please take advantage of this time. Therefore, I bought the Note II on the big screen to watch pdf files, and my ears could not be idle. However, instead of listening to English, I listened to novels. I like to listen to broadcasts when I was reading books, in particular, I need a lot of audio novels because I have many resources on the Internet, but the downloading pages are difficult. In order to get more traffic and ad clicks, the download links of these websites must be opened on at least two webpages to find the real link, which is very troublesome. To save the overall download time, I wrote this applet, it is convenient for you to download audio novels. Of course, there are any other types of resources)

First, let me explain that I don't want to crawl a lot of information and data, but just for entertainment and learning, so I won't crawl all the links of a website without any aim, but give a novel, for example, if I want to download the novel "childhood", I will find the homepage of the novel on the website where I listen to the evaluation, and use a program to download all mp3 audio. For details, refer to the code below, all the code is in the module crawler5tps:

1. Set the start url and the directory for saving the file.

 
 
  1. #-*-coding:GBK-*-  
  2.  import urllib,urllib2  
  3.  import re,threading,os  
  4.  baseurl = 'http://www.5tps.com' #base url   
  5.  down2path = 'E:/enovel/'        #saving path  
  6.  save2path = ''                  #saving file name (full path) 

2. parse the url on the download page from the start url

 
 
  1. Def parseUrl (starturl ):
  2. '''''
  3. Parse out download page from start url.
  4. Eg. we can get 'HTTP: // www.5tps.com/down/8297_52_00001.html' from 'HTTP: // www.5tps.com/html/8297.html'
  5. '''
  6. Global save2path
  7. RDownloadUrl = re. compile (".*? <A href = \ '(/down/\ w + \. html) \'. * ") # find the link of download page
  8. # RTitle = re. compile ("<TITILE>. {4} \ s {1} (. *) \ s {1}. * </TITLE> ")
  9. # <TITLE> voice novels: full set of Liu Tao </TITLE>
  10. F = urllib2.urlopen (starturl)
  11. TotalLine = f. readlines ()
  12. '''''Create the name of saving file '''
  13. Title = totalLine [3]. split ("") [1]
  14. If OS. path. exists (down2path + title) is not True:
  15. OS. mkdir (down2path + title)
  16. Save2path = down2path + title + "/"
  17. DownUrlLine = [line for line in totalLine if rDownloadUrl. match (line)]
  18. DownLoadUrl = [];
  19. For dl in downUrlLine:
  20. While True:
  21. M = rDownloadUrl. match (dl)
  22. If not m:
  23. Break
  24. DownUrl = m. group (1)
  25. DownLoadUrl. append (downUrl. strip ())
  26. Dl = dl. replace (downUrl ,'')
  27. Return downLoadUrl

3. parse the real download link from the download page

 
 
  1. Def getDownlaodLink (starturl ):
  2. '''''
  3. Find out the real download link from download page.
  4. Eg. we can get the download link 'HTTP: // 180j-d.ysts8.com: 8000/documentary/childhood/001.mp3? \
  5. 1251746750178x1356330062x1251747362932-3492f04cf54428055a110a176297d95a 'from \
  6. 'Http: // www.5tps.com/down/8297_52_00001.html'
  7. '''
  8. DownUrl = []
  9. Gbk_ClickWord = 'click here to download'
  10. DownloadUrl = parseUrl (starturl)
  11. RDownUrl = re. compile ('<a href = \"(. *) \ "> <font color = \" blue \ "> '+ gbk_ClickWord + '. * </a> ') # find the real download link
  12. For url in downloadUrl:
  13. Realurl = baseurl + url
  14. Print realurl
  15. For line in urllib2.urlopen (realurl). readlines ():
  16. M = rDownUrl. match (line)
  17. If m:
  18. DownUrl. append (m. group (1 ))
  19. Return downUrl

4. Define the download function

 
 
  1. def download(url,filename):  
  2.      ''''' download mp3 file ''' 
  3.      print url  
  4.      urllib.urlretrieve(url, filename) 

5. Create a Thread class for downloading files

 
 
  1. class DownloadThread(threading.Thread):  
  2.      ''''' dowanload thread class ''' 
  3.      def __init__(self,func,savePath):  
  4.          threading.Thread.__init__(self)  
  5.          self.function = func  
  6.          self.savePath = savePath  
  7.        
  8.      def run(self):  
  9.          download(self.function,self.savePath) 

6. Start download

 
 
  1. If _ name _ = '_ main __':
  2. Starturl = 'HTTP: // www.5tps.com/html/8297.html'
  3. DownUrl = getDownlaodLink (starturl)
  4. AliveThreadDict ={}# alive thread
  5. DownloadingUrlDict ={}# downloading link
  6. I = 0;
  7. While I <len (downUrl ):
  8. ''''' Note: I have heard that only three threads are allowed to download the same novel at the same time, but sometimes it is affected by the network ,\
  9. To ensure that the mp3 file is downloaded, set the number of threads to 2 '''
  10. While len (downloadingUrlDict) <2:
  11. DownloadingUrlDict [I] = I
  12. I + = 1
  13. For urlIndex in downloadingUrlDict. values ():
  14. # ArgsTuple = (downurlw.urlindex},save2path=str(urlindex1_1#'hangzhou ')
  15. If urlIndex not in aliveThreadDict. values ():
  16. T = downloadthread(downurl+urlindex},save2path+str(urlindex%1+'shanghai ')
  17. T. start ()
  18. AliveThreadDict [t] = urlIndex
  19. For (th, urlIndex) in aliveThreadDict. items ():
  20. If th. isAlive () is not True:
  21. Del aliveThreadDict [th] # delete the thread slot
  22. Del downloadingUrlDict [urlIndex] # delete the url from url list needed to download
  23. Print 'completed Download work'

In this way, let him enjoy it. I have to code other projects, Ah >>>

After the next copy to the Note class, you can listen to the novel while reading the materials, and finally attach the source code.

Link: http://www.cnblogs.com/wuren/archive/2012/12/24/2831100.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.