Here I would like to introduce another recommendation system called the latent factor (latent Factor) algorithm. The algorithm is the winning algorithm in the recommended algorithm competition for Netflix (yes, the company that uses big data to hold the "House of Cards"), which was first applied to film recommendations. This algorithm is more efficient than the algorithm error (RMSE) introduced in the actual application compared to the first @ Tai. I only use the underlying matrix knowledge below to introduce this algorithm.
The idea of this algorithm is this: each userhas their own preferences, such as a like with a small fresh , guitar accompaniment , Faye Wong and other elements (latent factor If a song (item) comes with these elements, the song is recommended to the user, which is to connect users and music with elements. Each person has different preferences for different elements, and each song contains elements that are not the same. We want to be able to find two of these matrices:
I. user-potential factor matrix Q,
Indicates that different users have no preference for elements, 1 delegates like it, and 0 means they don't like it. such as the following:
Two. latent factor-music matrix P
Indicates that each kind of music contains various elements of the composition, such as the following table, music A is a small fresh music, containing a small freshness of this latent factor ingredient is 0.9, the ingredients of the heavy taste is 0.1, the elegant ingredient is 0.2 ...
Using these two matrices, we can conclude that Zhang San's liking for music A is: Zhang San's preference for small freshness * Music A contains a small fresh ingredient + a preference for heavy tastes * Music A contains ingredients with heavy flavors + Elegant Preference * Music A contains elegant ingredients + ...
namely: 0.6*0.9+0.8*0.1+0.1*0.2+0.1*0.4+0.7*0=0.69
Each user's calculation of each song will give different users a matrix of scores for different songs. (Note that the wave line here represents the estimated score, and then we will use R without the wavy line to indicate the actual score):
So our team Zhang San recommended the highest score in the four songs of B, John Doe recommended the highest score of C, Harry recommended B.
If the matrix is represented as:
Here's the question, how does this potential factor (latent factor) get it?
It is obviously unrealistic to have a huge amount of data that allows users to classify music and tell us their preferences, in fact, we can get only the user behavior data. We use the "@ Tai" the Quantitative standard: Single loop = 5, share = 4, collection = 3, active play =2, listen to = 1, skip =-2, Pull Black =-5, in the analysis can get the actual score matrix R, that is, the input matrix is probably this way:
and the actual scoring matrix don't differ Too much, that is, to solve the following objective function:
This involves the optimization theory, in practical applications, it is often to add 2-norm penalty, and then use the gradient descent method can be obtained from the p,q two matrix estimates. We're not going to start talking here. For example, the example we gave above can be decomposed into such two matrices:
The two matrices multiply to get the estimated score matrix:
After the user has heard the music culling, select the highest score music recommended to the user (red text).
from the recommended results, it is recommended that the other high-scoring music:
LFM of recommended systems