Collaborative filtering-Music recommendations

Source: Internet
Author: User

I. Collaborative filtering algorithm

User-based collaborative filtering algorithm: The biggest problem of this algorithm is how to judge and quantify the similarity of two people, so the idea is
Example:
There are 3 songs put there, "the most dazzling national wind", "Sunny", "Hero".
A June, the collection of "the most dazzling national wind", and met "Sunny", "Hero" is always skipped;
b June, often single cycle, "the most dazzling national wind", "Sunny" will play, "Hero" is pulled black
C June, pulled black "the most dazzling national wind", and "Sunny" "Hero" are collected.
We all see, a, b two tastes close, C and they are very different.

So the question comes, say a b similar, exactly how similar, how to quantify?
We think of three songs as three dimensions of three-dimensional space, "the most dazzling national wind" is the x-axis, "Sunny" is the y axis, "Hero" is the z axis, the degree of love for each song is the coordinates of the dimension, and the degree of liking to do quantitative (such as: single cycle = 5, download = 4, collection = 3, active play =2, listen = 1, Skip =-1, Pull black =-5).
Then everyone's overall taste is a vector, a June is (3,-1,-1), b June is (5,1,-5), c June is ( -5,3,3).

We can use the cosine of the angle of the vector to denote the similarity of the two vectors, and the cosine of the 0-degree angle (which means that the two are exactly the same) is the cosine of the 1,180% angle (which indicates the opposite of the two)-1.
According to the cosine equation, the angle cosine = vector dot product/(cross product of the vector length) = (x1x2 + y1y2 + z1z2)/(with the number (X1 square +y1 Square +z1 squared) x number (x2 square +y2 Square +z2 squared))
See a June B June angle of the cosine is 0.81,a June c June Angle of the cosine is-0.97, formula sincerity not bullying me also.

The above is three-dimensional (three songs) situation, the same as the n-dimensional n Song of the situation is the same.

The key to this algorithm is how to find people who are similar to my hobby. In the actual operation, if all n people to the N song preference degree, the calculation is too large. We calculated the similarity in the previous process, we can only take K-people with a similarity of more than 0.9, and then use the similarity X-like degree to sum, so that we can get each song for your recommendation, which is basically the basis of the collaborative filtering algorithm.

Let's take a look at this algorithm for NetEase cloud music, in fact, shrimp music is more likely to use this collaborative filtering algorithm, because the shrimp has a function, called "taste similar" function, according to your taste recommended hobby of the same friends (although according to the page tips, more likely to be concerned about the singer, But it's not clear if there is a recommendation based on the playback record. Is this similar to the algorithm we used to calculate the user similarity? and NetEase Cloud music is more likely to take the following we want to mention the algorithm.

The collaborative filtering algorithm based on articles is more commonly used in shopping. Amazon invented, "Bought this commodity, also bought XXXX." In the shopping aspect, the user finally buys the product the behavior number is not many, uses this algorithm to be simple, the accuracy is also high. But in the music app, a user listens to a number of songs, and with this algorithm, the calculation is large and the accuracy is difficult to guarantee.

Two Potential factor algorithm

The idea of this algorithm is this: each user has their own preferences, such as a like with a small fresh, guitar accompaniment, Faye Wong and other elements (latent factor), if a song (item) with these elements, then the song is recommended to the user, That is, using elements to connect users and music. Each person has different preferences for different elements, and each song contains elements that are not the same. We want to be able to find two of these matrices:
One, the user-potential factor matrix Q, indicates that different users of the degree of preference for the use of elements, 1 representatives like, 0 is not like. such as the following:

Second, the latent factor-music matrix p, that each kind of music contains a variety of elements of the composition, such as the following table, music A is a small fresh music, containing a small freshness of this latent factor ingredient is 0.9, the ingredients of the heavy taste is 0.1, the elegant ingredient is 0.2 ...

Using these two matrices, we can conclude that Zhang San's liking for music A is: Zhang San's preference for small fresh music A contains a small fresh ingredient + a preference for heavy tastes music A contains the ingredients of the heavy taste + an elegant preference * music A contains elegant ingredients + ...


namely: 0.6*0.9+0.8*0.1+0.1*0.2+0.1*0.4+0.7*0=0.69

Each user's calculation of each song will give different users a matrix of scores for different songs. (Note that the wave line here represents the estimated score, and then we will use R without the wavy line to indicate the actual score):

So we recommend to Zhang San the highest score in the four songs of B, John Doe recommended the highest score of C, Harry recommended B.

If the matrix is represented as:

Here's the question, how does this potential factor (latent factor) get it?

It is obviously unrealistic to have a huge amount of data that allows users to classify music and tell us their preferences, in fact, we can get only the user behavior data. We use the quantitative criteria: single loop = 5, share = 4, collection = 3, active play =2, listen to = 1, skip =-2, Pull Black =-5, in the analysis can get the actual score matrix R, that is, the input matrix is probably this way:

In fact, this is a very, very sparse matrix because most users hear only a few of the entire music. How can we use this matrix to find potential factors? The main application here is the UV decomposition of the matrix. That is, the above scoring matrix is decomposed into two low-dimensional matrices, using the product of Q and P two matrices to estimate the actual scoring matrix, and we want to estimate the scoring matrix

For example, the example we gave above can be decomposed into such two matrices:

The two matrices multiply to get the estimated score matrix:

After the user has listened to the music culling, select the highest score music recommendation to the user can (red body word).

In this example user 7 and user 8 have strong similarities:

Recommended results, it is recommended that the other high-scoring music:

This should be the basic algorithm that NetEase cloud music uses. NetEase Cloud Music from a version of the iteration, the beginning of the NetEase cloud Music in the initial page when the user chose to like the tag, now the new User Guide interface is through a test, to test your basic music hobby. As can be seen, this is the potential factor algorithm to determine the user base of the potential factors, and then in the user's next use process, based on the user's actions to strengthen the user's potential factors.

When the most basic recommendation algorithm is determined, the user can get the precise recommended track, then what should we do? For most users, when using the recommended features, often do not need to listen to the repetition of the type of songs, users do not want to recommend all the most popular songs have been heard, do not want to recommend the same type of songs, which requires a specific screening process. Because for music recommendations, if you recommend the most popular songs, even if these songs are really his favorite, but users will not be surprised. For example, a user usually likes to listen to Jay Chou's songs, and then the recommended list is Jay's repertoire. Although this is really the user's favorite, but it is difficult to have a surprise feeling. A music app can make people have a stunning feeling, the recommended songs, in addition to like, but also should be most have not heard, or long ago heard the long-forgotten name, so as to be able to have enough user experience.

and QQ Music recently also launched its own personalized recommendation function, is his recommended algorithm engine:

But its user feedback is not very good, I think the main reasons are:
1. It is more difficult to get the right song according to the singer, the song list and other recommendations. Because singers, songs and so on more complex, users will because of a song and like a singer, but not necessarily like all his songs. NetEase Cloud Music also has this function, but does not appear in the personalized recommendation module, but put it in the understanding of people "dynamic" function module, let users make further choices.

2. Deficiencies in further screening after recommendation results. For example, I like Japanese songs to recommend the green Onion song, really can, but it is difficult to do the recommended function to achieve the stunning effect

Collaborative filtering-Music recommendations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.