First, the goal: Download NetEase cloud music Popular song list
Second, the module used: Requests,multiprocessing,re.
Third, step:
(1) Page analysis: first open NetEase cloud music, select popular song list, you can see the following list of songs, and then open the Developer tool
so we need to request the URL is https://music.163.com/discover/playlist, and then use the Requests.get () method to request the page, for the returned results, with the regular expression to parse, Get the song list name and song ID, parse the regular expression as follows:
res = requests.get (URL, headers=headers)
data = Re.findall (' <a title= ' (. *?) "href="/playlist\?id= (\d+) "class=" MSK "></a>", Res.text)
(2) After getting the song single name and the song single ID, construct the URL of the song list, and then imitate the steps (1) to get the song name and song ID, parse the regular expression as follows:
Re.findall (R ' <a href= "/song\?id= (\d+)" > (. *?) </a> ', Res.text)
After you get the song ID, construct the URL of the song, and then use the Requests.get (). Content method to download the song, the URL of the song is constructed as follows:
"http://music.163.com/song/media/outer/url?id=%s"% (song ID)
(3) because the name of some songs can not be saved as a file name, so the use of try...except, for can not be saved as the name of the song, I choose to pass off = =
(4) because to download multiple songs, a single song has a lot of songs, so the use of the Multiprocessing module pool method, improve the efficiency of the program operation.
Iv. Specific Code
Because downloading all the songs will take a long time, so let's download the first three songs to try = =
1 ImportRequests2 ImportRe3 fromMultiprocessingImportPool4 5headers = {6 'Referer':'https://music.163.com/',7 "user-agent":"mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/53.0.2785.89"8 "safari/537.36"9 }Ten One A defget_page (URL): -res = requests.get (URL, headers=headers) -data = Re.findall ('<a title= "(. *?)" href= "/playlist\?id= (\d+)" class= "MSK" ></a>', Res.text) the -Pool = Pool (processes=4) -Pool.map (Get_songs, Data[:3]) - Print("Download Complete! ") + - + defget_songs (data): APlaylist_url ="https://music.163.com/playlist?id=%s"% data[1] atres = Requests.get (Playlist_url, headers=headers) - forIinchRe.findall (R'<a href= "/song\?id= (\d+)" > (. *?) </a>', Res.text): -Download_url ="http://music.163.com/song/media/outer/url?id=%s"%I[0] - Try: -With open ('music/'+ i[1]+'. mp3','WB') as F: - F.write (Requests.get (download_url). Content) in exceptFilenotfounderror: - Pass to exceptOSError: + Pass - the * if __name__=='__main__': $Hot_url ="Https://music.163.com/discover/playlist/?order=hot"Panax NotoginsengGet_page (Hot_url)
V. Results of operation
"Python3 crawler" NetEase cloud Music song single Download