Original http://www.cnblogs.com/chenqingyang/p/3782649.html
A certain elective paper, posted here, the reason is posted, because I think I wrote really very meaningful, online to this thing is not very systematic Chinese introduction, I also read a lot of papers themselves have done a lot of practice to understand.
This is just an introduction, and later have time to add a specific model, algorithm.
-----------------------------------------------------------------------------
Generally speaking, the target of the recommendation system can be divided into two kinds of forecast score and item recommendation, and the former is more studied, because the former is more suitable to build complex model, so we only discuss the forecast score. Now mainstream recommendation methods can be divided into coordinated filtering and content-based recommendations, the former is focused, the latter, based on content recommendations that is, you used to buy something, see what movies, I will give you recommend the same thing or similar things, obviously such a model is too simple, Using this simple recommendation is certainly not the best result, so we need to have coordinated filtering.
Collaborative filtering can be divided into two models, one is called KNN, the user is the nearest neighbor model, the other is matrix factorization, decomposition model. There are two basic approaches to KNN: one is to find K users who are most similar to the user U, for example, if we want to predict U's score on the movie I, we can find other K-rated users who are too aggressive with the movie I, then calculate their similarity to the user U, and then weighted their score on I, As the user u to I score. There are many ways to compute the similarity between users, the most commonly used is the cosine similarity coefficient and Pearson similarity coefficient, which is to compute the cosine angle or Pearson coefficient between two vectors after each user's score as a vector. In fact, Pearson coefficient is a way to reduce the average, theoretically more close to the actual situation, more reliable. This is based on the user's recommendation, based on the recommendation of the article is also a reason, for example, we want to predict the user U on the item I score, we find the user U evaluation of other k items, with the same method weighted to get u to i prediction value.
Generally speaking, the KNN model based on the user and based on the items should be considered at the same time, someone has done this research, reference papers: [Tag-aware recommender Systems by Fusion of collaborative filtering Algorithms]
His approach is to add the results of the two by a weighted coefficient and then use the machine learning algorithm to learn the appropriate values for two coefficients.
KNN's approach is simple and easy to think about, so nature needs a more sophisticated approach to improving the system. The idea of Matrix factorization comes from SVD, the singular value decomposition (singular value decomposition). In linear algebra, we know that the importance of describing a common matrix can be achieved by eigenvalue decomposition, singular value decomposition, and so on, where eigenvalue decomposition is limited-only for square matrices. But the calculation of singular value is a difficult problem, is an O (n^3) algorithm. In the case of the stand-alone machine, of course, the MATLAB can be calculated in a second 1000 * 1000 matrix of all the singular values, but when the scale of the matrix growth, the calculation of the complexity of 3 times, the need for parallel computing participation. Google's Wu teacher in the mathematical beauty series when talking about SVD, said Google implemented the SVD algorithm, said this is a contribution to mankind, but also did not give a specific scale of calculation.
In fact, SVD can be implemented in parallel, in the solution of large-scale matrices, the general use of iterative methods, when the scale of the matrix is very large (for example, hundreds of millions), the number of iterations may also be billions of times, if the use of Map-reduce framework to solve, Every time map-reduce completes, it involves the operation of writing files and reading files. A personal guess is that Google's cloud computing system, in addition to map-reduce, there should be similar to the MPI calculation model, that is, the node is to maintain communication between the data is resident in memory, this calculation model than map-reduce in the resolution of the number of iterations is very much faster than many times. Lanczos iteration is a method of solving partial eigenvalues of symmetric matrices, which is to solve a symmetric equation into a three diagonal matrix. According to some of the online literature, Google should be using this method to do the singular value decomposition. The specific application of SVD in MF can refer to this paper: [A Guide to Singular Value decomposition]
In the specific matrix factorization, singular values are not directly computed, and are generally learned by random gradient descent. Now look at the meaning of matrix decomposition and decomposition, although this thing is actually inexplicable. If there is a U user and I goods, such a scoring matrix is a u*i matrix, assuming that after the decomposition, there are u*f and i*f such two matrices, that is, the decomposition of a vector of length f, for example, for example, the user has three: John, Dick, Wang, the film also has three: the Saving Private Ryan, The Great Gatsby, The Hobbit, the kind of decomposition vector may have the meaning: World War II, Spielberg, Petty bourgeoisie, love triangle, fantasy, the Middle Ages, so f is the 6,u*f matrix is to indicate the user to these labels cold degree, i* The matrix of F indicates the degree of ownership of these labels by the film, for example, the John Line in U*f is (0.9,0.5,0.2,0.2,0.4,0.3), we can know John like war movies, but not too fond of petty love films, if there is a new film Movie's vector is (0.4,0.3,0.5,0.6,0.7,0.5), then John's score on the new movie can be predicted as an inner product of two vectors.
In many papers, the information of the decomposed vectors is called latent factor, and the preceding KNN model does not take into account the influence of latent factor, which is the most important difference between the KNN and matrix factorization models.
In addition, the actual recommendation system will also take into account the impact of prejudice, also known as the base offset. For example, some users are very harsh and tend to score 1 points lower in the film. Some movies have bad reputation, such as "Small Time", it is easy to be deliberately rated low. So we need to build two vectors, a biased score for each user, a biased score for each movie. This eliminates the effects of prejudice. For instance, for example, the average score for all films is 3.5 points (5 points are full marks), the film "small times" tend to be rated low 2 points, and the user Floret scoring is not responsible, always score more than 1.5 points, then we to Floret of "small time" of the scoring forecast is 3.5-2+1.5=2.5 points. In the actual algorithm, the two offset matrices are also studied by the method of random gradient descent.
In this way, there are three kinds of collaborative filtering methods: Matrix decomposition, user nearest neighbor, datum migration, the model is more than one, are a variety of combinations to produce the recommended results. The general system in the first two combination of the effect is very good, but also some research combined with three methods, see [When svd++ meets neighbor] (PS: The author of this paper I did not find)
The emerging model:
The emerging model is actually based on matrix decomposition, but it adds more information. Now the recommendation system can no longer make a fuss over the underlying model, so you can only find ways to use more information. For example, the survey showed that 80% of users did not mind exposing sensitive information by taking into account the user's occupation, the labels the user had labeled themselves, and the labels of the films. So someone put the label into the system, see the paper: [Tag-aware recommender Systems by Fusion of collaborative filtering algorithms].
There is also the introduction of items, not only to recommend items to users, but also in passing the user's profile, users may be interested in film actors, and so on.
The model of item recommendation and the model of scoring prediction are somewhat different, many use the naïve Bayesian personalized sorting or improved naive Bayesian personalization sort, here no longer repeat, can participate in the thesis:[bpr: bayesian personalized Ranking from implicit feedback]