Item-based collaborative filtering algorithm based on Recommendation Algorithms

Source: Internet
Author: User

The item-based collaborative filtering algorithm (itemcf) is the most widely used algorithm in the industry. Its main idea is to recommend similar items to the previous item categories to users based on users' previous behaviors.

Item-based collaborative filtering algorithms are mainly divided into two steps:

1) calculate similarity between items.

2) generate a recommendation list for users based on item similarity and historical user behavior.

The key point of the first step is to calculate the similarity between items. Here we do not use content-based similarity, but calculate the number of users who like item I who like item j, the premise of this computation is that users' interests and interests are generally relatively definite and cannot be changed easily. When a user like two items, we can often think that these two items may belong to the same category. If n (I) is the number of users who buy item I, then the similarity between item I and item J can be wij = | n (I) & N (j) |/N (I).


The first step is to improve the time complexity: similar to usercf, we can create a user-item inverted table, so that each time we calculate the similarity between the items that a user has performed, it can ensure that the calculated similarity is useful, instead of spending a large amount of computing on those 0 (it must be a sparse matrix)

Method 1 of similarity improvement: If similarity is calculated based on the formula above, you will find that item I and item j are highly similar because of the high popularity of reading, so almost everyone can buy them. In this case, items with high popularity will not be differentiated, so we need to punish popular item j's weight wij = | n (I) & N (j) |/SQRT (N (I) * n (j ))

Step 1: Improve Similarity Method 2: Punish user activity. If a user buys only a limited number of books at a low level of activity, these books are likely to be useful in the calculation of item similarity within one or two areas of interest, however, if a bookstore seller buys 90% of Amazon's books at a discount and makes a difference, the user's behavior does not play a role in item similarity calculation, because 90% of the books will certainly cover many areas, it should be like method 1 to improve the punishment of user activity.

Step 1 similarity improvement method 3: item similarity normalization. Normalization not only improves recommendation accuracy, but also increases recommendation coverage and diversity. For example, on Amazon, users' interests and hobbies are certainly divided into several categories, and few are concentrated in one category. Assume that there are two types of A and B. The similarity between Class A is 0.5, that between Class B is 0.8, and that between Class A and B is 0.2, after a user buys five Class A books and five Class B books, we will recommend the books to the user. If we sort the books by similarity according to the previous method, the recommended items are all Class B items. Even if the ranking in Class B is relatively low, it is still higher than Class A. Therefore, we should normalize the similarity according to the category, in this way, the similarity of A is 1 and that of B is 1. In this way, the recommended products A and B are sorted, which greatly improves the accuracy, coverage and diversity.

The second step is relatively simple. Calculate the similarity (weight and) between the item and the purchased item, and select topn based on the order of similarity.


Itemcf is widely used in the actual system and has two main advantages:

1) compared with a user-User table, the item-item table is much smaller and easier to process.

2) itemcf is easy to provide recommendation reasons. For example, you have bought "Data Mining" to recommend "machine learning". This increases trust and increases interaction between users and recommendation systems, further enhance personalized recommendations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.