1. Introduction to SVD
Let's assume that we want to predict the score of 0 Jun on m in a movie, while only 0 Jun's score on M Jun and Feng yanjun's score on M's score (including M's score ). So can we predict the score of zero on M? The answer is obviously yes. The simplest method is to divide the prediction score directly into an average score. However, the accuracy is hard to say. This article will introduce an algorithm that is much more accurate than the simplest method and is not complex.
The idea of SVD (Singular Value Decomposition) isBased on the existing scores, the raters' preferences for each factor and the degree to which the film contains each factor are analyzed. Finally, the score is predicted based on the analysis results.. Factors in a movie can be understood as such: The funny degree of a movie, the degree of love in a movie, and the degree of terror of a movie ...... The abstract point of the SVD idea is to use the scoring matrix R (R [u] [I]) of N rows and M columns to represent the rating of the U users on the I items ), split into a user factor matrix P (P [u] [k] indicates the user U's liking for factor K) in N rows and F columns) and an item factor matrix Q (Q [I] [k] indicates the degree of factor K for the I item) in column F of the m row ). It is represented by a formula.
R = p * t (q)// T (q) indicates the transpose of the Q matrix
The following is an example of dividing the score matrix R into a user factor matrix P and an item factor matrix Q. The greater the element value of R, the more users like the movie. A larger element value of P indicates that the user prefers the corresponding factor. The greater the element value of Q, the higher the score of the item. After decomposition, we can use p and q to predict the score of zero on "Seven Nights. According to this example, zero should give "seven nights" a lower score. Because he does not like horror movies.Do not tangle with the specific values in the graph.Because the values are randomly entered.
In fact, when we give an e-movie rating, in addition to considering whether the movie matches your taste, it will also be affected by whether you are a strict score reviewer and the existing score of the movie. For example, a strict score is generally lower than a loose score. When you see that most of the movies have high scores, you may also tend to score higher scores. In SVD, the taste problem is expressed by a factor, but there are two irrelevant sub-representations left. Therefore, it is necessary to add relevant parts to improve the accuracy of the model. The improved SVD formula is as follows:
R = overallmean + biasu + Biasi + p * t (q) (1)
Overallmean indicates the average score of all movies, biasu indicates the deviation of user scores from overallmean, Biasi indicates the deviation of movie scores from overallmean, and p and q do not change.Special noteExcept for overallmean, the other parts are matrices.
After decomposition, that is, after the five parameters in the (1) formula have correct values, they can be used to predict scores. Suppose we want to predict the user U's rating for movie I:
Bu indicates the degree of deviation from the u users, Bi indicates the degree of deviation from the I movies, and Pu indicates the degree of interests of the U users, qi indicates the Factor Level of the movie I.
2. SVD implementation
In the first part of the example, you may have a question: clearly, the value of an element in the scoring matrix is empty. Why can we still obtain two complete matrices, p and q? The reason is that the two matrices are obtained through learning. SVD uses stochastic gradient descent to learn parameters except overallmean in formula (1. The learning process can be summarized as follows:First, give each parameter an initial value, use these parameters for prediction, compare the prediction result with the known score, and then modify each parameter according to the comparison result.. To be more accurate, adjust the parameter value so that the following formula can obtain the minimum value:
Alpha indicates all training samples. The Section enclosed by the first parentheses represents the deviation between the current prediction result and the actual value. The Section enclosed by the second parentheses is used to prevent overfitting ).
The above is the main idea for implementing SVD. For specific implementation, refer to my code. The movielens 1 Mbit/s has a better performance than the one mentioned in a Guide to Singular Value Decomposition for collaborative filtering. Here, I mainly mention the points to note when implementing SVD:
A. when updating the Qi, save it first.
B. The prediction score must be within the minimum and maximum values.
In addition, this is some useful suggestions I have found:
A. the regularization values of all parameters are the same, so you do not need to distinguish Bu, Bi, p, q
B. Bu and Bi do not need to be initialized. Set all to 0.
C. p, q should be initialized. Generally, 0.1 * rand (0, 1)/SQRT (DIM) is used to indicate the feature dimension.
3. Extended reading
Although the following articles are in English, they provide excellent explanations on SVD and are strongly recommended to those interested in SVD.
1. Netflix update: try this at home
2. A Guide to Singular Value Decomposition for collaborative filtering
3. Matrix Factorization techniques for recommender systems