Today's recommendation technologies andAlgorithmThe most widely recognized and adopted is the collaborative filtering-based recommendation method.This article will show you more about the secrets of collaborative filtering. Go to the topic
1. What is collaborative filtering?
Collaborative Filtering is a typical method of using collective wisdom. To understand what collaborative filtering (CF) is, first consider a simple question. If you want to watch a movie, but you don't know which one to watch, what do you do? Most people will ask their friends to see what movie recommendations are available recently, and we generally prefer to get recommendations from friends with similar tastes. This is the core idea of collaborative filtering.
In other words, we recommend recommendations based on the opinions of the people you are interested in.
2. Implementation of collaborative filtering
To implement a collaborative filtering recommendation algorithm, perform the following three steps:
Collect data -- find similar users and items -- make recommendations
Collect data
The data here refers to the historical user behavior data, such as the user's purchase history, follow, favorite behavior, or some comments, and how many points a certain item has been scored, these can be used as data for recommendation algorithms and serve recommendation algorithms. It should be pointed out that different data types have different accuracy and granularity. The impact of noise must be taken into account during use.
Find similar users and items
This step is also very simple, in fact, it is to calculate the similarity between users and items. There are several ways to calculate similarity:
Euclidean distance
Pearson Correlation Coefficient
Cosine Similarity
Tanimoto Coefficient
For recommendation
After knowing how to calculate similarity, we can make recommendations.
There are two mainstream methods in collaborative filtering: user-based collaborative filtering and item-based collaborative filtering. How can we describe their principles? Let's take a look at the figure.
The basic idea of user CF is quite simple. Based on users' preferences for items, we can find neighboring users, and then recommend them to the current users. In computing, a user's preference for all items is used as a vector to calculate the similarity between users. After finding the K-neighbor, based on the similarity weight of the neighbor and their preference for items, predict items that do not involve preferences of the current user, and calculate a list of sorted items as a recommendation. An example is provided. For User A, based on the user's historical preferences, only one neighbor-user C is calculated here, and user C's favorite item D is recommended to user.
The principle of item-based CF is similar to that of user-based Cf. It only uses the item itself when calculating neighbors, rather than finding similar items based on users' preferences, then, we recommend similar items to the user based on their historical preferences. From the computing point of view, it is to use all users' preferences for an item as a vector to calculate the similarity between items. After obtaining similar items of an item, predict the items that the current user has not expressed as preference based on the user's historical preferences, and calculate a list of sorted items as recommendation.An example is provided. For item A, according to the historical preferences of all users, users who like item A like item C. It is concluded that item A is similar to item C, if user C prefers item A, it can be inferred that user C may also like item C.
Summary
The above two methods can provide good recommendations and achieve good results. But there are differences between them, and their applicability is also different. The following is a comparison
Computing complexity
Item CF and user CF are two basic algorithms for collaborative filtering recommendation. User CF was proposed a long time ago, item CF has become popular since Amazon's papers and patents were published (around 2001). We all think that item CF has better performance and complexity than user cf, one of the main reasons is that for an online website, the number of users often exceeds the number of items, and the item data is relatively stable. Therefore, similarity calculation not only requires a small amount of computing, and do not need to be updated frequently. However, we often ignore this situation and only adapt to e-commerce websites that provide commodities. For news, blogs, or microcontent recommendation systems, the situation is often the opposite, the number of items is massive and frequently updated. Therefore, from the complexity perspective, these two algorithms have their own advantages in different systems, the recommendation engine designer needs to select a more appropriate algorithm based on the characteristics of their own applications.
Applicable scenarios
In non-social network websites, the internal relationship of content is an important recommendation principle, which is more effective than the recommendation principle based on similar users. For example, when you read a book on the book purchase website, the recommendation engine will recommend relevant books to you. The importance of this Recommendation far exceeds the comprehensive recommendation of the user on the homepage of the website. We can see that in this case, item CF recommendations have become an important means to guide users to browse. At the same time, item CF is easy to explain the recommendation. In a non-Social Network website, it recommends a book to a user, at the same time, the explanation is that a person who has similar interests with you has read this book, which is hard to convince the user, because the user may not know the person at all; but if it is explained that this book is similar to a book you have read before, users may feel reasonable and have adopted this recommendation.
On the contrary, in today's popular social network sites, user CF is a better choice. The addition of user CF and social network information can increase the user's confidence in the recommendations.
I hope the above content will be helpful to you ~ For more details, see http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/index.html