[Deep Learning Study Notes] recommending music on Spotify with deep learning

Source: Internet
Author: User
Main Content: Spotify is a music website similar to cool music. It provides personalized music recommendations and music consumption. The author uses deep learning combined with collaborative filtering for music recommendation.
Details:
1. Collaborative Filtering
Basic principle: two users listen to similar songs, indicating that the two users are interested and have similar tastes. A group of two songs are listened to by the same group, indicating that the two songs have similar styles. Disadvantages: (1) failing to take advantage of the features (information) of the song itself (2) unable to process "hierarchical" items. for a song, this hierarchical relationship is embodied in: these factors are not equally important (3) Cold Start Problem: collaborative filtering is used for user behavior recommendation, what should I do if there are no users or items?
2. Content-based recommendation
Spotify is trying to use the relationship (similarity) between content for recommendation. Content-based recommendation: In addition to user behavior information, Song content information can also be used to calculate similarity between items.
What I want to try here is to calculate the similarity between songs based on the audio signal of the songs themselves. DL is good at processing audio, images, and other raw information. The author uses DL here to process the Audio Information of songs and calculate the similarity between songs. (Personal supplement: At the same time, the user can listen to the song list as a song. This song can also calculate the similarity with other songs, this calculates the similarity between a user and a song, and further serves as a reason for recommendation to the current user)
3. Predicting listening preference with deep learning
Map the Audio Information of a song and other tag information of the Song (album, author, etc.) to a low-dimensional latent space by means of a deep neural network, and use this low-dimensional space vector to represent the song.
The structure of the neural network is as follows:

The leftmost side is the audio input layer, followed by a volume and network, followed by several fully connected normal neural networks. The author does not explain why the network structure is like this. At the last layer, there are 40 nodes and the output is the result of the vector_exp algorithm. It is not softmax. However, from the perspective of the name, it seems that both of them are similar. It may be an implementation function and another non-linear function.
A problem with neural networks is that the output of neural networks is fixed and the audio of songs (both in terms of time and file size) become longer, how does one map the original input of the edge length to a fixed-length neural network input? -- This is also a problem encountered in the NLP field when using neural networks. The author's approach is rough and simple: divide the audio file into three seconds, and then take the average value of the input values of each segment as the input vector of the entire song. This is also the case for a user who has heard many songs in history and finally obtains the user's song vector.
4. Training
Use the MSE criterion for training. Another problem is that DL is a supervised learning method. What is the standard output of neural networks? That is, for a given song, what is the final 40-dimension standard output result? In my understanding, the article refers to the dimension output result of the current song after the standard collaborative filtering model and dimensionality reduction through Lda. So, the following question: Since the matrix decomposition result is used as the standard, why is it difficult for the author to train DL? On the one hand, it solves the cold start problem (for the prediction of new songs), and on the other hand, it may map Audio Information to matrix decomposition results, as the author mentioned at the beginning, you can also use the Audio Information.
After training the neural network in the model application, a new song can be computed using the established network parameters to obtain the corresponding latent representation (output result of 40 dimensions) of the song ); for a user, the calculation of his or her historical listening habits can also calculate such a 40-dimensional result. The angle between the two vectors can calculate whether the user is interested in the song.
5. Other attempts
The author also made other attempts, including training 40-dimensional results with different collaborative filtering models, and then assembling these vectors; or trying to use more layers of neural networks.

6. What is it learning?
Analyze the specific learning results: audio clips and so on.
.
Reprinted please indicate the source: http://blog.csdn.net/xceman1997/article/details/38475083

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.