Recommendation algorithms-user-based collaborative filtering algorithms

Source: Internet
Author: User

Collaborative Filtering is the most basic recommendation algorithm. It can be divided into user-based collaborative filtering algorithms and item-based collaborative filtering algorithms.


This article mainly introduces user-based collaborative filtering algorithms. Simply put, to recommend user U, you only need to find users with similar behavior as U, users who are similar to u can recommend their behaviors to user U. Therefore, the user-based system filtering algorithm involves two steps: 1) finding a user set that is similar to the target user interest; 2) Finding the user favorite in this set, in addition, the target user has never heard of the item and is recommended to the target user.


The key point of the first step is to calculate the similarity between users. The similarity is generally obtained by using the jaccard formula or cosine similarity, and the proportion of the total behavior is calculated (the specific formula is Google, csdn insertion formula is inconvenient ...), Therefore, the complexity of user similarity calculation is O (n * n). N indicates the number of users, which is not practical for websites with a large number of users, for example, if the number of Amazon users is n> 100000, the complexity is unacceptable.

The first step is to improve the time complexity: Because the similarity between many users is actually 0, if it is regarded as a matrix of N * n, it must be a sparse matrix, in fact, there is no need to waste the computing workload on these 0. We can create an item to the user's inverted table, and find all users who have behavior on the item based on the item, and then traverse each item, find a user who has behavior on the item, and calculate the behavior similarity between these users (total behavior + 1, and the number of behaviors of these users ), finally, calculate the proportion of public actions among the two users to their respective actions.

The first step is to improve similarity calculation: for example, if both of them have bought "Xinhua Dictionary", it cannot be an illustration of the two's imagination, because this book is generally bought by everyone, if both of them have bought machine learning, we can be sure that they share the same interests in this regard, that is, the more you perform the same behavior on unpopular items, the more similar the users are. That is, when calculating user similarity, You need to reduce the impact of popular items (by calculating popularity, then we use 1/N (I) to calculate the proportion of public behaviors. N (I) indicates popularity. In this way, the proportion of highly popular items is relatively low)


The second step is relatively simple. Select k users that are most similar to user U, and recommend the items they like and those that user U does not like to u. K is very important. The larger K, the more popular the recommendation results, the higher the popularity, and the lower the coverage rate, because the basic recommendations are popular items.


Step 2: Scoring Prediction Improvement Method: Generally, not all items in step 2 are recommended to users, because there are still many such items. Generally, we will choose topn, select the N items that the user may be most interested in. The first n items must be sorted Based on the scores. This leads to a problem where different people have different base points. For example, if a scores at 4, good-looking movie ratings: 5 points, poor-looking ratings: 3 points, but B points: 2 points, good-looking ratings: 3 points, poor-looking ratings: 1 point, in this case, it is inaccurate to calculate the score directly based on the score. The improvement method is to calculate the user's score on the base point. For example, a scored (5-4) for a good-looking movie, for a movie that is not easy to watch, score (3-4), score (3-2) for a movie that is not good to watch, and score (1-2) for a movie that is not good to watch) in this way, the two rating movies are similar, and when the calculation needs to recommend users to rate movies, you only need to calculate the mean value of the neighborhood and add the user's base point (average value is generally used)


User-based collaborative filtering algorithms are rarely used in practice. On the one hand, because there are more users, the complexity of the algorithms is still very high, and on the other hand, such recommendations are difficult to give recommendations, therefore, the industry generally chooses item-based collaborative filtering algorithms.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.