NetEase Cloud Music of the song single recommendation algorithm is how?
This is Amazon invented the "like this commodity, but also like XXX" algorithm.
At the core is the mathematical "cosine equation of two vectors in a multidimensional space", at the outset I was really amazed by this algorithm.
=============2014-12-01 Update =============================
Sorry, I said the wrong, special to correct and add.
The algorithm of the "Product Recommendation" system (collaborative filtering) is divided into two major categories,
First, people-oriented, first find someone similar to you, and then see what they bought that you didn't buy. The most classical implementation of this kind of algorithm is "cosine formula of two vectors ' angle in multidimensional space";
In the second category, the relation matrix of similarity between commodities is established directly by object. The most classic of this type of algorithm is ' slope =1 ' (Slope one). Amazon invented the second type of brute force simplification algorithm, ' bought
The person of this commodity, also bought xxx '.
Let's take a look at the first category, the biggest problem is how to judge and quantify the similarity between two people, the idea is this--
Example:
There are 3 songs put there, "the most dazzling national wind", "Sunny", "Hero".
A June, the collection of "the most dazzling national wind", and met "Sunny", "Hero" is always skipped;
b June, often single cycle, "the most dazzling national wind", "Sunny" will play, "Hero" is pulled black
C June, pulled black "the most dazzling national wind", and "Sunny" "Hero" are collected.
We all see, a, b two tastes close, C and they are very different.
So the question comes, say a b similar, exactly how similar, how to quantify?
We think of three songs as a three-dimensional space of three dimensions, "the most dazzling national wind" is the x-axis, "Sunny" is the y axis, "Hero" is the z axis, the degree of love for each song is the coordinates of the dimension,
And the degree of liking to do quantitative (such as: single cycle = 5, share = 4, collection = 3, active play =2, listen = 1, skip =-1, Pull black =-5).
Then everyone's overall taste is a vector, a June is (3,-1,-1), b June is (5,1,-5), c June is ( -5,3,3). (I'm sorry I can't draw a stereoscopic picture)
We can use the cosine of the angle of the vector to denote the similarity of the two vectors, the cosine of the 0-degree angle (which means that the two are exactly the same) is 1, and the cosine of the 180% angle (which indicates the opposite of the two) is-1.
According to the cosine formula,angle cosine = vector dot product/(cross product of vector length)= (x1x2 + y1y2 + z1z2)/(with number (x1 square +y1 Square +z1 squared) x number (x2 square +y2 Square +z2 squared))
See a June B June angle of the cosine is 0.81, a June c June angle of the cosine is-0.97, formula sincerity not bullying me also.
The above is three-dimensional (three songs) situation, the same as the n-dimensional n Song of the situation is the same.
Suppose we choose 100 seed song, calculated the similarity between the June, then when we find a June also like to listen to the "Little Apple" B actually did not hear, I believe we all know how and B June recommended it.
The benefits of the first type of people-oriented recommendation algorithm I think it's clear, that's precision!
The cost is a lot of computing, and for new people (less listening, less action), also not very good,
So people invented the second kind of algorithm.
Suppose we to the new D June, only know that she likes the most dazzling national wind, then the question came, to her recommended what good?
, recommend "Sunny Day"!
Hehe, the benefits of the second class of algorithms are also seen, simple rough good operation (also suitable for map-reduce), can be a point of poor accuracy.
Therefore, the real recommendation of the website of the algorithm, they are in the synthesis of the above two types of algorithms based on their respective development and constantly improve the regulation, outsiders are not known! ^_^
= = = 2014-12-03 Refill = = =
Thank you @ Liu Yanbin gave a very professional comment, do not post out a pity.
"This can only be said to be theoretical basis." Songs do not consider popular unpopular, while not considering the number of users and the number of songs to calculate the complexity of the 11th day offline data calculation is not complete (of course, NetEase Cloud Music user Volume
Small total violence calculation when I'm not saying it, it's actually a lot more complicated to use. The present recommendation system does not have an algorithm take-all, in addition to the problem of the algorithm, but also to consider the underlying data factors,
For example, two songs a single How many songs coincide, the quality of the song list is how. "
I said the last one,
The ' Vector angle cosine ' solves the problem of ' quantifying the customer's taste similarity ' (the most classical solution, there are other solutions),
Not with it can easily achieve the first kind of algorithm, the difficulty behind.
I'm not doing ' cf/algorithm/Data Mining/Internet ', just a few years ago, I've seen this article be amazing a bit,
See this question is casually shook a clever, and then by the comment area several bring the bench friend to push up ^_^
Since everyone is so interested, I am here to throw bricks and say ' Nao3 ' (DONG4) after the theory Foundation.
Continue to the topic of the first type of algorithm, the goal of "Daily song Recommendation" (in fact, the main interest is this bar, next to ' according to your favorite XXX recommended yyy song list ' I think not how).
The first is how to set the dimension.
Directly with the ' song ' When the dimension is not, the first is too much to count, the second dimension has been soaring is not a thing.
With a ' song list ' or ' album ', ' Singing/playing '? There are similar difficulties.
When it comes to this, we should all realize that we don't have ' tag '!
Cloud Music in the early days, tag can be filled out by everyone, I remember I filled out ' Mozart ', ' Steel Association ', ' Symphony ' such tag, now are gone.
After a period of time, tag can not be self-filled, only from the cloud music to the tag Lib selected, this must have a reason.
My guess is that they need to use tag as a dimension, so they don't want the tag number to change a lot.
The first stage, they need to collect user input to make the tag lib,
In the second stage, they built a multidimensional space and did not want to move the dimension again, thus turning off the function of the self-filled tag.
Assuming that tag is the dimension, the second difficulty is that the ' scale ' on the dimension must have a positive negative to make it work,
Users do not have the opportunity to directly express the likes and dislikes of tag (can't collect, play, skip a tag), how to set the scale it.
I think that behind each song has its own tags this attribute, this property is not visible in the UI is likely because it is more prone to saliva.
Songs are often subordinate to a lot of songs, and those songs are tags, according to those songs of the number of playlists to share the number of shares can determine its ' authority ',
Take the ' authoritative ' High song list tag and you can get the Tag property of each song.
Then the user in the expression of the likes and dislikes of a song, in fact, unconsciously affected his scale in the corresponding dimension.
Assuming that the dimension and scale are resolved like this, then we can make a ' taste vector ' for each user, and the next difficulty is,
When to calculate/How to save ' user similarity '?
All Users 22 calculate the similarity, save as a nxn matrix, this kind of thing is not to play.
In fact, to this step, do not consider ' people-oriented ', directly according to my favorite tag, from the tag to pick some popular high, or jump up the song to recommend also can be a good.
However, in that case, it is easy to homogenization, it is not easy for users to ' amazing '.
Let's continue with the idea of the first class of algorithms.
One of the great benefits of multidimensional space is that there is the concept of ' like limit ',
For example, we can roughly assume that the person with whom I am a person of the same kind, who is ' similar ' to me,
If you can't find a person with the same image limit because of too many dimensions or too few initial users, you can also go to the ' adjacent ' image limit.
OK, let's say we find a group of people who are ' congenial collaboration ' according to tag and their own image limits.
In this group of people, choose a few ' and I angle cosine ' the largest (and then combine personal fame such as star, number of fans, and my interactivity, etc., better),
From the songs they've heard and I've never heard, pick a bunch they like, or they hear new, new collections, or total popularity, etc.
Can be said to be "based on my taste to generate" the "Daily Song Recommendation".
Here I would like to introduce another recommendation system called the latent factor (latent Factor) algorithm. This algorithm is in Netflix (yes, with big data to hold fire
The recommended algorithm for the company of the House of Cards is the first to be used in the film recommendations. This algorithm in practical applications than now ranked first in the Tai original Lang
The algorithm error (RMSE) introduced will be much smaller and more efficient. I only use the underlying matrix knowledge below to introduce this algorithm.
The idea of this algorithm is this: each userhas their own preferences, such as a like with a small fresh , guitar accompaniment , Faye Wong and other elements (latent factor ),
If a song (item) comes with these elements, it is recommended to the user, which is to use elements to connect users and music. Everyone has different preferences for different elements, and
Each song contains different elements. We want to be able to find two of these matrices:
One, the user-potential factor matrix Q, indicates that different users of the degree of preference for the use of elements, 1 representatives like, 0 is not like. such as the following:
Second, the latent factor-music matrix P, indicates that each kind of music contains various elements of the composition, such as the following table, music A is a small fresh music, containing small fresh this latent Factor
The ingredient is 0.9, the ingredient of the heavy flavor is 0.1, the elegant ingredient is 0.2 ...
Using these two matrices, we can conclude that Zhang San's liking for music A is: Zhang San's preference for small freshness * Music A contains small fresh ingredients + preference for heavy tastes * Music A contains
The ingredients of the heavy taste + the preference for Elegance * Music A contains elegant ingredients + ...
namely: 0.6*0.9+0.8*0.1+0.1*0.2+0.1*0.4+0.7*0=0.69
Each user's calculation of each song will give different users a matrix of scores for different songs. (Note that the wave line here represents the estimated score, and we will then
Use R with no wavy lines to indicate the actual rating):
So our team Zhang San recommended the highest score in the four songs of B, John Doe recommended the highest score of C, Harry recommended B.
If the matrix is represented as:
Here's the question, how does this potential factor (latent factor) get it?
It is obviously unrealistic to have a huge amount of data that allows users to classify music and tell us their preferences, in fact, we can get only the user behavior data.
We use the Tai: Single loop = 5, share = 4, collection = 3, active play =2, listen = 1, skip =-2, Pull black =-5, can be obtained in the analysis
The actual scoring matrix R, which is the input matrix, is probably like this:
In fact, this is a very, very sparse matrix because most users hear only a few of the entire music. How can we use this matrix to find potential factors? Here are the main applications to the
is the UV decomposition of the matrix. That is, the above scoring matrix is decomposed into two low-dimensional matrices, using the product of Q and P two matrices to estimate the actual scoring matrix, and we want to estimate the scoring matrix
and the actual scoring matrix do not differ too much, that is, to solve the following objective function:
This involves the optimization theory, in practical applications, it is often to add 2-norm penalty, and then use the gradient descent method can be obtained by this p,q two matrix estimates. Here we
Don't start talking about it. For example, the example we gave above can be decomposed into such two matrices:
The two matrices multiply to get the estimated score matrix:
After the user has listened to the music culling, select the highest score music recommendation to the user can (red body word).
In this example user 7 and user 8 have strong similarities:
From the recommended results, it is recommended that the other high-scoring music:
NetEase Cloud Music recommendation algorithm