How to use Python web crawler to crawl the lyrics of NetEase cloud music

Source: Internet
Author: User
Tags python web crawler


A few days ago to share a small series of data visualization analysis, at the end of the text mentioned NetEase cloud music lyrics Crawl, today's small series to share NetEase cloud Music lyrics Crawl method.

The general idea of this article is as follows:

Find the correct URL, get the source code;

Use BS4 to parse source code, get song name and song ID;

Call NetEase Cloud Song API to get lyrics;

Write the lyrics to the file and save it locally.

The purpose of this article is to get the lyrics of NetEase cloud music and to save lyrics to local files. The whole is as follows:

Based on Python netease cloud music lyrics crawl

Lei's Song

This article takes the folk song God Lei as the data collection object, specially collects his song lyrics, other singers ' lyrics collection way can analogy, displays is "Chengdu" the lyrics.

Based on Python netease cloud music lyrics crawl

Lei Song---"Chengdu"

Generally speaking, the URL displayed on the page can be written in the program, after running the program can collect the source of the Web page we want. But in the NetEase Cloud Music website, this road does not work, because the URL in the webpage is a false URL, the real URL is no # number. Nonsense not much to say, directly on the code.


Based on Python netease cloud music lyrics crawl

Get Web page source code

This paper uses requests, BS4, JSON and re modules to capture NetEase cloud music lyrics, remember to add headers and anti-hotlinking referer in the program to simulate the browser, to prevent access denied by the site. Here the Get_html method is specifically used to obtain the source code, usually we also have to do exception handling, for a rainy day.

Get to the Web page source code, analysis source, found the name and ID of the song is very deep, find her 1100 degrees, found her in the source of 294 lines, hidden in <ul class= "f-hide" > tag, as shown:

Based on Python netease cloud music lyrics crawl

where the song name and ID exist

Next we use the beautiful soup to get the target information, directly on the code, such as:

Based on Python netease cloud music lyrics crawl

Get Song name and ID

It is important to note that when you get the ID, you need to slice the link, and the number is the ID of the song, and the song name is obtained through the Get_text () method, and finally the song name and ID one by one are returned with the zip function.

After getting the ID, you can go to the inside page to get the lyrics, but the URL is still not to force, such as:

Based on Python netease cloud music lyrics crawl

The URL of the lyrics

Although we can see the lyrics in black and white on the page, we can't get the lyrics information under the URL. Small series by grasping the package, found the lyrics of the URL, found that it is a POST request and a lot of data can not understand, in short, this URL is not for our effectiveness. What about the solution?

Mo Panic, small series found NetEase cloud Music API, as long as the song's ID on the API link can get to the lyrics, the code is as follows:

Based on Python netease cloud music lyrics crawl

Call NetEase Cloud API and parse lyrics

In the API, the lyrics information is loaded in JSON format, so you need to use JSON to parse it out, and with regular expressions to clean the lyrics, if you do not use regular expression to clean, get the original data as shown below (here with Lei's song "Chengdu" for example):

Based on Python netease cloud music lyrics crawl

Raw data

It is obvious that the lyrics are preceded by the time of the lyrics, and for us it is the impurity information, so we need to use regular expressions to match. Admittedly, regular expressions are not the only way, and small partners can also take slices or other methods for data cleansing, and not to repeat them here.

After you get the lyrics, write it to a file and deposit it into a local file with the following code:

Based on Python netease cloud music lyrics crawl

Write file and program body parts

Now as soon as we run the program and enter the singer's ID, the program will automatically capture the lyrics of the singer's songs and coexist locally. If the ID of lei in this example is 6731, after entering the number 6731, the lyrics of Lei will be crawled, as shown in:

Based on Python netease cloud music lyrics crawl

Program Run Results

Then we can find the generated lyrics text in the same directory as the script, and the lyrics are crawled down smoothly.

I believe that you have a certain understanding of NetEase cloud lyrics climbed, but easier said than down, small series suggest you do a personal knock code, in practice you will learn faster, learn more.

This article teaches you how to collect NetEase cloud lyrics, that NetEase Cloud song how to collect it? And listen to small tell ~ ~ ~

How to use Python web crawler to crawl the lyrics of NetEase cloud music

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.