Qqmusic on the music is still a lot of, sometimes want to download good music, but every time in the Web download is annoying log-in or something. So, came a qqmusic reptile.
At least I think the most important thing for a for-loop crawler is to find the URL where you want to crawl the element. Let's start looking for it (don't laugh at me)
#寻找url:
This URL does not want other sites so easy to find. I am tired of not light, the key is more data, from so many data inside to pick out useful data, and finally combination of music real music. When I did it yesterday, I sorted out a few intermediate URLs:
#url1: https://c.y.qq.com/soso/fcgi-bin/client_search_cp?&lossless=0&flag_qc=0&p=1&n=20&w= Rain Butterfly
#url2:https://c.y.qq.com/base/fcgi-bin/fcg_music_express_mobile3.fcg?&jsonpcallback=musicjsoncallback& cid=205361747&[songmid]&c400+songmid+.m4a&guid=6612300644
#url3: http://dl.stream.qqmusic.qq.com/[filename]? vkey=[Vkey](where Vkey replaces the music-specific string)
Requests (URL1)
The Songmid and mid of each music are obtained by the search list (the two values are, by the author's observation, each of which is music-specific). With these two values. The following is the exact value of the complete url2.
Requests (URL2)
The vkey value of each music in the search results is obtained, and by the author's observation, filename is C400songmid. m4a. The specific values of the url3 are then determined. And Url3 is the real URL of the music, because the author of the other parameters of this URL study is not thorough enough, so each time up to return 20 music URL, with the URL, that Tencent music can enjoy the enjoyment.
#代码
Here's a SRCS code block:
ImportRequestsImportUrllibImportJsonword='Rain Butterfly'res1= Requests.get ('https://c.y.qq.com/soso/fcgi-bin/client_search_cp?&t=0&aggr=1&cr=1&catZhida=1&lossless =0&flag_qc=0&p=1&n=20&w='+word) jm1= Json.loads (Res1.text.strip ('callback () []')) Jm1= jm1['Data']['Song']['List']mids=[]songmids=[]srcs=[]songnames=[]singers= [] forJinchJM1:Try: Mids.append (j['Media_mid']) songmids.append (j['Songmid']) songnames.append (j['Songname']) singers.append (j['singer'][0]['name']) except: Print('wrong') forNinchRange (0,len (Mids)): Res2= Requests.get ('https://c.y.qq.com/base/fcgi-bin/fcg_music_express_mobile3.fcg?&jsonpcallback=musicjsoncallback& Cid=205361747&songmid='+songmids[n]+'&filename=c400'+mids[n]+'. m4a&guid=6612300644') jm2=json.loads (res2.text) vkey= jm2['Data']['Items'][0]['vkey'] Srcs.append ('http://dl.stream.qqmusic.qq.com/C400'+mids[n]+'. m4a?vkey='+vkey+'&guid=6612300644&uin=0&fromtag=66')
#下载:
With the SRCS, downloading nature is not a problem. Of course Get the singer and song name is also can copy src to the browser download. can also use large python bulk download, is nothing more than a loop, with us before the download Sogou image method similar to: (author py version: python3.3.3)
Print(' for'+word+'Start Download ...') x=Len (SRCS) forMinchRange (0,x):Print(str (m) +'***** '+songnames[m]+' - '+singers[m]+'. M4A * * * *'+'Downloading ...') Try: Urllib.request.urlretrieve (srcs[m],'d:/music/'+songnames[m]+' - '+singers[m]+'. M4A') except: x= X-1Print('Download wrong~')Print('For ['+word+'] Download complete'+STR (x) +'Files!')
The above two pieces of code, written in the same py file, run to download the corresponding keyword music
#运行效果:
Download Start, below ... To the download directory to see:
Music has been successfully downloaded ...
At this point, about yesterday to do the Qqmusic URL crawler ideas and implementation of the narrative completed.
#用途:
Musicplayer a good shell classmate, should be used. Actually doing this is intended for my HTML-based Musicplayer service. But now stuck in the JS call py link, I look for it, understand the classmate hope to tell, thank you very much!
Python crawls qqmusic music URLs and downloads in bulk