This article brings you the content is about the super simple Python crawler netease cloud music Download, there is a certain reference value, the need for friends can refer to, I hope that you have some help.
Goal
The occasional chance to hear the landlord's cat "Clouds into the Rain", instantly fascinated by the lazy voice and students angry lyrics, and then continue to cycle to listen to their songs. Then also deliberately to brush the anime "I am Jiangxiaobei", good looking forward to the second season ...
I want to see you, even a quick glance on the parting ...
Okay, don't talk nonsense. The main goal of this time is to download the lyrics and audio of the singer's popular music according to the singer's ID in NetEase cloud, and save it to a local folder.
Configuration Basics
Python
Selenium (Configuration Method Reference: Selenium configuration)
Chrome browser (Others can also, need to make corresponding changes)
Analysis
If you crawl the website of NetEase Cloud Small partners should know that NetEase cloud has anti-crawling mechanism, post need to some information parameters of cryptographic function simulation. But here for the sake of simplicity, small white also can understand. Use selenium directly to simulate logins and then use the interface to download music and lyrics directly.
Experimental Steps :
Get the singer's popular songs list, song names and links, and save them to a CSV file according to the singer ID;
Read the CSV file, according to the song Link, extract the song ID, and then use the corresponding interface, download music and lyrics;
Save your music and lyrics locally.
Python implementation
This section will introduce several key functions ...
Get singer Info
The use of selenium we do not need to see the Web page request, directly from the source of the Web page to extract the corresponding information. View the singer page source code can be found, we need information within the IFRAME framework, so we first need to switch to the IFRAME:
Browser.switch_to.frame (' Contentframe ')
Keep looking down and find that the song names and links we need are in id="hotsong-list"
the tag, and then each line corresponds to a tr
label. So get all the tr
content first and then iterate through the individual tr
.
data = browser.find_element_by_id ("Hotsong-list"). Find_elements_by_tag_name ("tr")
Note: The previous one is, find_element
and the find_elements
latter one is, which returns a list.
The next is to parse tr
the contents of a single tag, get the song name and link, you can find the two in the class="txt"
tag, and the link is a href
property, the name is a title
property, can be directly obtained through the get_attribute()
function.
For I in range (len data): content = data[i].find_element_by_class_name ("txt") href = content.find_element_by _tag_name ("a"). Get_attribute ("href") title = Content.find_element_by_tag_name ("b"). Get_attribute ("title") Song_info.append ((title, href))
Download lyrics
NetEase Cloud has an interface to get lyrics, the link is: Http://music.163.com/api/song ...
The number in the link is the song ID, so we have the song ID, you can download the lyrics directly from the link, the lyrics file is the json
format, so we need to use the json
package.
and directly get the lyrics, each line has a timeline, you need to use regular expressions to remove, the complete code is as follows:
def get_lyric (self): url = ' http://music.163.com/api/song/lyric? ' + ' id= ' + str (self.song_id) + ' &lv=1&kv=1 &tv=-1 ' r = requests.get (URL) json_obj = r.text j = json.loads (json_obj) lyric = j[' LRC ' [' Lyric '] # using regular expressions to remove the timeline regex = Re.compile (R ' \[.*\] ') final_lyric = re.sub (Regex, ", lyric) return Final_ Lyric
Download audio
NetEase Cloud also provides the interface of the audio file, the link is: http://music.163.com/song/med ...
The number in the link is the ID of the song, and you can download the audio file directly based on the song's ID. The complete code is as follows:
def Get_mp3 (self): url = ' http://music.163.com/song/media/outer/url?id= ' + str (self.song_id) + '. mp3 ' try: print ("Downloading: {0}". Format (self.song_name)) urllib.request.urlretrieve (URL, ' {0}/{1}.mp3 '. Format ( Self.path, Self.song_name) print ("Finish ...") except: print ("Fail ...")