Super simple python crawler netease cloud music download

Source: Internet
Author: User
This article brings you the content is about the super simple Python crawler netease cloud music Download, there is a certain reference value, the need for friends can refer to, I hope that you have some help.

Goal

The occasional chance to hear the landlord's cat "Clouds into the Rain", instantly fascinated by the lazy voice and students angry lyrics, and then continue to cycle to listen to their songs. Then also deliberately to brush the anime "I am Jiangxiaobei", good looking forward to the second season ...

I want to see you, even a quick glance on the parting ...

Okay, don't talk nonsense. The main goal of this time is to download the lyrics and audio of the singer's popular music according to the singer's ID in NetEase cloud, and save it to a local folder.

Configuration Basics

    • Python

    • Selenium (Configuration Method Reference: Selenium configuration)

    • Chrome browser (Others can also, need to make corresponding changes)

Analysis

If you crawl the website of NetEase Cloud Small partners should know that NetEase cloud has anti-crawling mechanism, post need to some information parameters of cryptographic function simulation. But here for the sake of simplicity, small white also can understand. Use selenium directly to simulate logins and then use the interface to download music and lyrics directly.

Experimental Steps :

    1. Get the singer's popular songs list, song names and links, and save them to a CSV file according to the singer ID;

    2. Read the CSV file, according to the song Link, extract the song ID, and then use the corresponding interface, download music and lyrics;

    3. Save your music and lyrics locally.

Python implementation

This section will introduce several key functions ...

Get singer Info

The use of selenium we do not need to see the Web page request, directly from the source of the Web page to extract the corresponding information. View the singer page source code can be found, we need information within the IFRAME framework, so we first need to switch to the IFRAME:

Browser.switch_to.frame (' Contentframe ')

Keep looking down and find that the song names and links we need are in id="hotsong-list" the tag, and then each line corresponds to a tr label. So get all the tr content first and then iterate through the individual tr .

data = browser.find_element_by_id ("Hotsong-list"). Find_elements_by_tag_name ("tr")

Note: The previous one is, find_element and the find_elements latter one is, which returns a list.

The next is to parse tr the contents of a single tag, get the song name and link, you can find the two in the class="txt" tag, and the link is a href property, the name is a title property, can be directly obtained through the get_attribute() function.

For I in range (len data):    content = data[i].find_element_by_class_name ("txt")    href = content.find_element_by _tag_name ("a"). Get_attribute ("href")    title = Content.find_element_by_tag_name ("b"). Get_attribute ("title")    Song_info.append ((title, href))

Download lyrics

NetEase Cloud has an interface to get lyrics, the link is: Http://music.163.com/api/song ...

The number in the link is the song ID, so we have the song ID, you can download the lyrics directly from the link, the lyrics file is the json format, so we need to use the json package.

and directly get the lyrics, each line has a timeline, you need to use regular expressions to remove, the complete code is as follows:

def get_lyric (self):    url = ' http://music.163.com/api/song/lyric? ' + ' id= ' + str (self.song_id) + ' &lv=1&kv=1 &tv=-1 '    r = requests.get (URL)    json_obj = r.text    j = json.loads (json_obj)    lyric = j[' LRC ' [' Lyric ']    # using regular expressions to remove the timeline    regex = Re.compile (R ' \[.*\] ')    final_lyric = re.sub (Regex, ", lyric)    return Final_ Lyric

Download audio

NetEase Cloud also provides the interface of the audio file, the link is: http://music.163.com/song/med ...

The number in the link is the ID of the song, and you can download the audio file directly based on the song's ID. The complete code is as follows:

def Get_mp3 (self):    url = ' http://music.163.com/song/media/outer/url?id= ' + str (self.song_id) + '. mp3 '    try:        print ("Downloading: {0}". Format (self.song_name))        urllib.request.urlretrieve (URL, ' {0}/{1}.mp3 '. Format ( Self.path, Self.song_name)        print ("Finish ...")    except:        print ("Fail ...")
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.