Recommendation System-matrix decomposition of hidden factors and hidden factor matrix decomposition
When new users come into contact with the recommendation system field, the first thing that is difficult to understand is the collaborative filtering method. In this case, Baidu will obtain the most Singular Value Decomposition Method (SVD ). SVD is used to divide a matrix into three matrices and multiply them. If it is used in the recommendation system, we first represent our training set as a matrix. Here we take the movielen dataset as an example. This dataset contains the user's rating on movies. The matrix format is roughly as follows:
| |
Movie1 |
Movie2 |
Movie3 |
Moive4 |
| User1 |
1 |
|
|
|
| User2 |
2 |
|
|
3 |
| User3 |
|
5 |
4 |
|
| User4 |
2 |
|
|
4 |
1 ~ 5 is the rating of the corresponding user on the movie. In the free space, the dataset does not contain information about users and movies. If we want to use SVD, we usually enter 0 in the free space. If this matrix is V, we can use SVD to obtain
V=UΣVT
However, the equal sign above cannot be obtained and can only be approximately equal to or equal. Then we multiply the three matrices to get V' (note that it is different from V ). Then the original blank space (that is, 0) may no longer be 0, so this is the prediction of this user-moive pair. This is the main principle of SVD. Because SVD has many existing algorithms that can be directly obtained without iteration, it is easy to use.
However, we can see that the above method has a fatal defect, that is, to set all unknown scores to 0. this is actually extremely unreasonable, because users do not like a movie (0 points), but may not have watched it. In this way, we added our subjective hypothetical information and finally caused errors. The solution is to use the matrix factorization method of the hidden factor. Note that there are also differences between the matrix decomposition method and the SVD method. I will introduce the matrix decomposition method in detail below.
In the matrix decomposition method, there is a assumption that every user has a feature vector with a length of k, each movie also has a feature vector of the same length (k usually needs to be specified by the user ). Then all user feature vectors are arranged into a matrix U with the dimension UserNum * k. The vector corresponding to user I is Ui. The feature vectors of all movies are arranged into a matrix. The dimension of M is MoiveNum * k. The vector corresponding to movie j is Uj. In this case, user I scores movie j by Vij = <Ui, Mj> (<> representing dot multiplication ). Then the scores between all users and all movies can be obtained by multiplying the two matrices:
V'=UMT
Note that this is V, not V. So the question is, how can we determine the U and M? A natural idea is to make V' and V equal as much as possible. There is another problem, that is, V (that is, the dataset) is not rated in many places. How can we determine whether it is equal to V? Here, we only calculate the MSE with a score. In this way, no additional information is used, and the two are close to each other. Naturally, we have to introduce our lost function:
HereIIj indicates that user I has a score record for movie j. The following two items are punitive factors to prevent overfitting. Then, by using the gradient descent method, we can obtain the U and M values through iteration. Its Requirements for U and M are as follows:
Now we have completed the basic matrix decomposition method. Further, to achieve better results, we need to consider the impact of each individual score, because some users have a high score and some users have a low score. The same for movies and all ratings. Therefore, the formula for calculating the score should be changed:
Vij = <Ui, Mj> + overall_mean + ai + bj
Overall_mean indicates the average of all ratings, ai indicates the average of user I ratings, and bj indicates the average of movie j scores. Overall_mean is a constant, and both ai and bj are parameters to be optimized. Here we will not give a formula for their derivation. I will directly give an algorithm in the matrix form for your specific implementation. (INCOMING)
I wrote a python version myself. If you are interested, please refer to the https://github.com/ccienfall/RecommandSystem/blob/master/script/Factorize.py.
In the 2016 Byte Cup International Machine Learning competition, SVD and matrix decomposition (MF) are used respectively. The final result is that the MF method is about 20% better than the SVD method.
References: A Guide to Singular Value Decomp osition for Collaborative Filtering