In the previous blog post, I have summed up several major recommendations, including content-based and collaborative filtering is the current mainstream algorithm, many e-commerce site recommendation system is based on these two algorithms. Based on the content in the first blog post has been introduced in detail, so Ben Boven is mainly introduced based on collaborative filtering personalized recommendation system.
Collaborative filtering is a recommendation based on a group of users or projects with the same interest, which generates a recommended list of targeted users based on the preference information of the neighbor user (who is similar to the target user's interest). Collaborative filtering algorithm is mainly divided into user-based collaborative filtering algorithm and project-based collaborative filtering algorithm.
The user based collaborative filtering algorithm is based on the preference information of the neighbor user to generate recommendations to the target user. It is based on the assumption that if some users score more closely on a particular category of items, they will be more close to scoring other items. The Collaborative filtering recommendation system uses the statistical calculation method to search for similar users of the target user, and predicts the target user's scoring to the specified item according to the score of similar users, and finally chooses the score of the previous several similar users with higher similarity as the recommendation result, and gives feedback to the user. This algorithm is simple and accurate, and is widely adopted by the existing collaborative filtering recommendation system. The core of user-based collaborative filtering recommendation algorithm is to calculate the nearest neighbor collection by the similarity measure method, and return the nearest neighbour's scoring result to the user as the recommended prediction result. For example, in the user-item scoring matrix shown in the following table, rows represent users, columns represent items (movies), and values in a table represent the user's evaluation of an item. Now it's time to predict the user Tom's rating of the King of Gun King (user Lucy's score on Avatar is missing).
It's not hard to see that Mary and Pete scored very close to the movie, Mary's score for Twilight 3: Eclipse, Aftershock, Avatar was 3, 4, 4,tom scored 3, 5, 4, respectively, and they had the highest similarity, so Mary was Tom's closest neighbor. , Mary's score on the King of Guns was the largest proportion of the impact on the predicted value. By contrast, the user John and Lucy are not Tom's closest neighbors, because they have a big gap in the score of the film, so Johln and Lucy's score on the King of Kings has a relatively small impact on the predicted value. In real-world predictions, the referral system searches only the first few neighbors and forecasts the score for the specified project based on the scores of those neighbors. It is not difficult to know from the above example that the main work of the user one based collaborative filtering recommendation algorithm is the users similarity measure, the nearest neighbor query and the forecast score.
At present, there are three methods to measure the similarity between users: cosine similarity, correlation similarity and corrected cosine similarity.
① cosine Similarity (cosine): User an item scoring matrix can be considered as a vector on n-dimensional space, and for items without scoring, the scoring value is set to 0, and the cosine similarity measure method is to measure the similarity between users by calculating the cosine angle between the vectors. Set the vector I and J respectively to the user I and user J in the n-dimensional space of the score, then use the collaborative filtering based on the personalized recommendation algorithm for e-commerce users I and user J similarity between:
② modified cosine similarity (adjustedcosine): Cosine similarity does not take into account the user scoring scale problem, such as in the scoring interval [115], the user a rating of more than 3 is their favorite, and for users B, the rating of more than 4 is his favorite. By subtracting the user's average score from the item, the corrected cosine similarity measurement method improves the above problem. The similarity between user I and User J is as follows: A collection of items that represent user I and User J, which represent user I and user J, respectively, which represents a set of items that are scored by users I and users.
③ correlation Similarity (Correlation) This method is measured using Pearson (Pearson) correlation coefficients. Set Iij represents the project collection that user I and user J have scored together, the similarity between user I and User J is:
After the nearest neighbor of the target user is obtained, the corresponding recommendation result is generated. Set Nnu as the nearest neighbor collection for user u, the user u-Puj is calculated as follows for the prediction score of item J:
The project-based (item-based) collaborative filtering is based on the assumption that the user is scoring the target item according to the scoring data of the similar project, which is based on the following assumptions: If most users have a similar score for some items, the current user will be more likely to score these items. Ltem a based collaborative filtering algorithm mainly studies the group of projects evaluated by the target user, calculates the similarity between these projects and the target items, and then outputs from the top K most similar items before selecting, which is different from the user-based collaborative filtering. Still take the user an item scoring matrix as an example, or predict the user Tom to the film "King of Guns" score (user Lucy to the film "Avatar" score is missing data).
Through data analysis, we found that the film "Twilight 3: Eclipse" scored very similar to "King of Kings" score, the top three users of the "Twilight 3: Eclipse" scored 4, 3, 2, the first three users of "King of Guns" score of 4, 3, 3, they are the highest similarity, so the film "Twilight 3: Lunar eclipse" Is the best neighbour of the film King of Guns, so Twilight 3: Eclipse has the biggest impact on the predicted value of the king of the gun. and "Aftershock" and "Avatar" is not "King of the Gun king" Good neighbor, because the user group of their scores there is a big gap, so the film "Aftershock" and "Avatar" on the "King of the Gun King" score on the predicted value of the relatively small impact. In real-world predictions, the referral system searches only the first few neighbors and forecasts the score for the specified project based on the scores of those neighbors.
By the above example it is not difficult to know, item a based collaborative filtering recommendation algorithm's main work content is the nearest neighbor query and generate recommendations. Therefore, the item one based collaborative filtering recommendation algorithm can be divided into the nearest neighbor query and generate a recommendation of two stages. The nearest neighbor query phase is to calculate the similarity between the project and the project, to search for the nearest neighbour of the target item, and to generate a recommendation phase that predicts the score of the target item based on the user's scoring information about the nearest neighbor of the target item, and then the top N recommendations.
Ltem The key step of a based collaborative filtering algorithm is still to calculate the similarity between projects and to select the most similar items, which is similar to the user one based collaborative filtering. The basic idea for calculating the similarity between the two items I and J is to first extract the users who have scored together on the two items, and to think of the scores scored for each project as vectors for n-dimensional user space, and then to calculate the similarity between the two by the similarity metric formula.
After a similar project is separated, the next step is to predict the target project, and calculate the user U's expectations for project I by calculating the total evaluation score for the project collection that is similar to project I. The specific formulas and operation steps of these two phases are similar to the user-based collaborative filtering recommendation algorithm, so we will not repeat them here.
Collaborative filtering has the following advantages over content-based recommendation algorithms: the ability to filter information that is difficult to automate based on content analysis. such as art, music, can be based on a number of complex, difficult to express the concept (information quality, grade) to filter, recommended novelty.
However, collaborative filtering also has the following drawbacks: The user's evaluation of the product is very sparse, so that the user-based evaluation of the user's similarity may be inaccurate (that is, the problem of sparsity); As users and products increase, the performance of the system will become lower (scalability issues) If a product has never been evaluated by a user, the product cannot be recommended (that is, the initial evaluation of the problem).
A recommendation system based on collaborative filtering