Recently learning Python, using the version of python3.4, the development environment for Eclipse using the Pydev plugin. Just think http://www.dexiazai.com/?page_id=23 on the music good, decided to use Python bulk download down.
1. Music Address
After analysis, the address of the page embedded in the shrimp player is as follows, followed by a comma-delimited character for the music ID, such as the music address is http://www.xiami.com/song/2088578
<span style= "FONT-SIZE:14PX;" ><span style= "FONT-SIZE:14PX;" > <embed src= "http://www.xiami.com/widget/0_ 2088578,163414,603408,1769697211,1519594,1944911,1769482546,1771498226,2050705,1769213208,2957497, 1769779902,1769897948,1770077250,1771638197,109133,1769220090,3469026,2456779,3673869,385167,1769528393,1770130506,101458 2,3418745,1769554460, 1769692638,1279925,1769582513,136064,1769528375,1769455237,1769075782,2095209,1770618381,3427512,2108249,1771186364,20875 41,1769384565,1770432131, 2149137,2083819,1768911382,3429194,2089207,1770177060,1770427913,1769279049,2089339,2085205,3437055,3646041,2070983,20707 41,3619123,1770068122, 2082956,2071004,1768679,1769683697,3567557,109133,1769572701,2152946,3489617,1770292731_ 235_346_ff8719_494949_1/multiplayer.swf "type=" Application/x-shockwave-flash " width=" 235 "height=" "wmode" = "opaque" ></EMBED></SPAN></SPAN>
After analysis, the XML information of music can be queried in http://www.xiami.com/song/playlist/id/2088578/object_name/default/object_id/0, Where location is an encrypted source address that can be decrypted to get the correct address. The specific operation can be referred to the "Python Crawl Shrimp Music" this blog.
2. Get the ID of all music, form the list
<span style= "FONT-SIZE:14PX;" ><span style= "FONT-SIZE:14PX;" > dexiazai_url= "http://www.dexiazai.com/?page_id=23" req=urllib2. Request (Dexiazai_url, headers={ ' Connection ': ' keep-alive ', ' Accept ': ' text/html, application/xhtml+xml, * * * ', ' accept-language ': ' en-us,en;q=0.8,zh-hans-cn;q=0.5,zh-hans;q=0.3 ', ' user-agent ': ' mozilla/5.0 ( Windows NT 6.3; WOW64; trident/7.0; rv:11.0) like Gecko ' }) Response=urllib2.urlopen (req) content=response.read (). Decode (' Utf-8 ') Pattern=re.compile (' <embed.*?src= ' http://www.xiami.com/widget/0_ (. *?) /multiplayer.swf "', Re. S) Ids=re.search (pattern,content). Group (1) idarr=ids.split (",") </span></span>
3. Get the music name (plus serial number)
<span style= "FONT-SIZE:14PX;" ><span style= "FONT-SIZE:14PX;" > url= "http://www.xiami.com/song/" +str (Idarr[i]) print ("==================num:" +str (i) + "================ ======= ") print (URL) #获取歌词名 req=urllib2. Request (URL, headers={' Connection ': ' keep-alive ', ' Accept ': ' text/html, Application/xhtml+xml, */* ', ' Accept-language ': ' en-us,en;q=0.8,zh-hans-cn;q=0.5,zh-hans;q=0.3 ', ' user-agent ': ' mozilla/5.0 (Windows NT 6.3; WOW64; trident/7.0; rv:11.0) like Gecko '}) Rep=urllib2.urlopen (req) Cont=rep.read (). Decode (' Utf-8 ') Pat=re.compil E (' <div.*?id= ' title > (. *?)
5. Effect
6, use Cx_freeze packaging release exe
Because python3.4 in Py2exe or Pyinstaller release a bit of a problem (not supported), so with Cx_freeze release, Cx_freeze for Http://www.lfd.uci.edu/~gohlke/pythonlibs /#cx_freeze I downloaded the CYTHON?0.22?CP34?NONE?WIN32.WHL, is the python3.4 installation directory using py3.4 install D:\****\cython?0.22?cp34?none? WIN32.WHL installation.
And then in python3.4 's installation directory \lib\site-packages\cx_freeze\samples\ PYQT4 Copy the setup.py and edit the installation file name as the file to be published, and then execute the Python setup.py build command at the command line, generate the build folder with the executable file Xiami_download_ Dexiazai.exe
python3.4 Crawler Bulk Download music