Super simple python crawler netease cloud music download

Last Update:2018-08-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article brings you the content is about the super simple Python crawler netease cloud music Download, there is a certain reference value, the need for friends can refer to, I hope that you have some help.

Goal

The occasional chance to hear the landlord's cat "Clouds into the Rain", instantly fascinated by the lazy voice and students angry lyrics, and then continue to cycle to listen to their songs. Then also deliberately to brush the anime "I am Jiangxiaobei", good looking forward to the second season ...

I want to see you, even a quick glance on the parting ...

Okay, don't talk nonsense. The main goal of this time is to download the lyrics and audio of the singer's popular music according to the singer's ID in NetEase cloud, and save it to a local folder.

Configuration Basics

Python
Selenium (Configuration Method Reference: Selenium configuration)
Chrome browser (Others can also, need to make corresponding changes)

Analysis

If you crawl the website of NetEase Cloud Small partners should know that NetEase cloud has anti-crawling mechanism, post need to some information parameters of cryptographic function simulation. But here for the sake of simplicity, small white also can understand. Use selenium directly to simulate logins and then use the interface to download music and lyrics directly.

Experimental Steps :

Get the singer's popular songs list, song names and links, and save them to a CSV file according to the singer ID;
Read the CSV file, according to the song Link, extract the song ID, and then use the corresponding interface, download music and lyrics;
Save your music and lyrics locally.

Python implementation

This section will introduce several key functions ...

Get singer Info

The use of selenium we do not need to see the Web page request, directly from the source of the Web page to extract the corresponding information. View the singer page source code can be found, we need information within the IFRAME framework, so we first need to switch to the IFRAME:

Browser.switch_to.frame (' Contentframe ')

Keep looking down and find that the song names and links we need are in id="hotsong-list" the tag, and then each line corresponds to a tr label. So get all the tr content first and then iterate through the individual tr .

data = browser.find_element_by_id ("Hotsong-list"). Find_elements_by_tag_name ("tr")

Note: The previous one is, find_element and the find_elements latter one is, which returns a list.

The next is to parse tr the contents of a single tag, get the song name and link, you can find the two in the class="txt" tag, and the link is a href property, the name is a title property, can be directly obtained through the get_attribute() function.

For I in range (len data):    content = data[i].find_element_by_class_name ("txt")    href = content.find_element_by _tag_name ("a"). Get_attribute ("href")    title = Content.find_element_by_tag_name ("b"). Get_attribute ("title")    Song_info.append ((title, href))

Download lyrics

NetEase Cloud has an interface to get lyrics, the link is: Http://music.163.com/api/song ...

The number in the link is the song ID, so we have the song ID, you can download the lyrics directly from the link, the lyrics file is the json format, so we need to use the json package.

and directly get the lyrics, each line has a timeline, you need to use regular expressions to remove, the complete code is as follows:

def get_lyric (self):    url = ' http://music.163.com/api/song/lyric? ' + ' id= ' + str (self.song_id) + ' &lv=1&kv=1 &tv=-1 '    r = requests.get (URL)    json_obj = r.text    j = json.loads (json_obj)    lyric = j[' LRC ' [' Lyric ']    # using regular expressions to remove the timeline    regex = Re.compile (R ' \[.*\] ')    final_lyric = re.sub (Regex, ", lyric)    return Final_ Lyric

Download audio

NetEase Cloud also provides the interface of the audio file, the link is: http://music.163.com/song/med ...

The number in the link is the ID of the song, and you can download the audio file directly based on the song's ID. The complete code is as follows:

def Get_mp3 (self):    url = ' http://music.163.com/song/media/outer/url?id= ' + str (self.song_id) + '. mp3 '    try:        print ("Downloading: {0}". Format (self.song_name))        urllib.request.urlretrieve (URL, ' {0}/{1}.mp3 '. Format ( Self.path, Self.song_name)        print ("Finish ...")    except:        print ("Fail ...")

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Super simple python crawler netease cloud music download

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Super simple python crawler netease cloud music download

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support