Recommendation System-Collaborative filtering recommendation

Source: Internet
Author: User


"Recommender System an Introduction", chapter II, Collaborative filtering recommendations.


Defined


The main idea of the collaborative filtering recommendation approach is to use past behavior or opinions from existing user groups to predict what the current user is most likely to like or to be interested in. This type of recommender system is currently widely used in the industry.

Purely collaborative methods of input data only given the user-item scoring matrix, output data generally have the following types:

(1) Indicates the current user to the item likes or dislikes the degree the forecast value;

(2) List of n recommended items.


User-based nearest neighbor recommendation


Main ideas


This is an early method, user-based nearest neighbor recommendation. The main ideas are summarized as follows:

First, given a scoring dataset and the ID of the current (active) user as input, identify other users who have similar preferences to the current user, who have sometimes become peer users or nearest neighbors;

Then, for each product p that the current user has not seen, use its nearest neighbor to calculate the predicted value of P's score.

The assumptions of this approach are: (1) If users have similar preferences in the past, they will have similar preferences in the future, and (2) User preferences will not change over time.


Calculation steps


Suppose we want to recommend user A, the calculation steps are roughly the following:

(1) First, we want to calculate the similarity between user A and all other users; When calculating this similarity, there are many available algorithms, for example, improved cosine similarity, spearman rank correlation coefficient, mean square and so on, experimental analysis shows that for the user-based recommender system, Pearson correlation coefficient is better than other algorithms, while Pearson algorithm calculates similarity, its value is between 1 and 1;

(2) Select the nearest neighbor; We will only select users with positive association with the current user, the usual way to reduce the size of the nearest neighbor collection is to define a specific minimum threshold for user similarity, or to limit the size to a fixed value, and only consider K nearest neighbors. The potential problems with both of these options are the nearest neighbor approach:

A, the first: If the similarity threshold is too high, the nearest neighbor size will be very small, which means that many items can not be predicted (reduce coverage), conversely, if the threshold is too low, the neighbor size will not be significantly reduced.

B, the second: the choice of K value (nearest neighbor size) does not affect coverage. But how do you find a good K-value? When the number of neighbors K is too high, too many neighbors with limited similarity will bring additional "noise" to the forecast, and when K is too small, the predictive quality may be negatively affected. The study found that, in most practical cases, 20-50 neighbors seemed reasonable.

(3) forecast score; Consider the bias of the most popular K nearest neighbor and user A to evaluate the score, and predict the rating of all items by user A.

(4) optimization; better similarity and empowerment system.

A, in many areas, there are some popular items that allow two users to agree on controversial items, which is more valuable than agreeing on popular items, but Pearson such a similarity method cannot take this into account. It was suggested that the grading of items should be changed to reduce the relative importance of the same view on popular items, i.e. "anti-user frequency (IUF)". Others have solved the same problem by "variance weighting factor", which improves the value of high-variance items, that is, the role of controversial items.

b, two users only score very few items at the same time (or they may just happen to have the same opinion), or whether they agree on many items. In fact, the prediction method based on the nearest neighbor score will make an error when it encounters the current user scoring only a very small number of common items, leading to an inaccurate prediction. For this reason, some people put forward another weighting factor, namely "importance empowerment".

C, there is a way to provide recommended accuracy by fine-tuning the predictive weights, "sample extensions". Refers to values that emphasize those close to +1 and-1, which are multiplied by a constant for the original value to adjust the nearest neighbor's weighted value.

Note:

(1) The Pearson correlation coefficient calculation formula, the calculation of user A on the item scoring mathematical formula, are fixed and universal, there are many algorithmic framework to help us achieve well, here is not the algorithm described.

(2) The actual production, the optimization part of the consideration of less, first to achieve the most basic function.


Item-based nearest neighbor recommendation


Main ideas


In large e-commerce sites, the number of users is large and the number of items is small, so a different technology is often used: item-based recommendations. The calculation is small, in the case of a very large scoring matrix, can also be done real-time calculation recommendations.

The main idea of object-based algorithm is to calculate the predicted value by using the similarity between objects rather than the similarity between users.


Calculation steps


Based on item's nearest neighbor recommendation, the calculation steps are as follows;

(1) Calculate the similarity between items, usually using the cosine similarity or improved cosine similarity algorithm; In some only have the most basic recommendation system, no personalized recommendation of the e-commerce recommendation system, to this step to complete the recommended calculation, according to the similarity of goods recommended items, for all users are the same recommendation results;

(2) Select nearest neighbor;

(3) Predict the user's rating of the item;

(4) Optimization: Considering the memory requirements, the similarity matrix of n items will have n^2 term, but the actual number of items will be very low, and further methods can be taken to reduce the complexity.

Compared to the user similarity, the item similarity is more stable, and this preprocessing calculation does not affect the prediction accuracy too much.


About ratings


Show ratings: Feedback between users on the item's preferences;

Implicit scoring: Collects user's behavior through the website log, for instance the e-commerce website's view/cart/collect/buy and so on;


Data sparse and cold start issues


In actual production, the scoring matrix is generally very sparse because users generally only evaluate (or buy) a small number of items.

The challenge in this case is to get accurate predictions with relatively few effective scores. The direct approach is to use the user's additional information, such as gender, age, education, interest, and so on, so similar users set not only based on scoring, but also based on external information, seemingly no longer a "pure" collaborative approach. Nonetheless, this technology is useful for the large number of key users needed to obtain a synergistic approach during the expansion phase of the recommended service.

In order to solve the problem of cold start and data sparse, a graph-based approach was proposed, and the main idea was to use the "transitivity" of the hypothetical user's taste, and thus enhance the additional information matrix. Here is not the detailed study, the actual production, for cold start problem, you can use other recommended methods to supplement the data, such as content-based recommendations.


More ways to recommend


(1) matrix factor decomposition. Key words: dimensionality reduction, singular value decomposition (SVD), latent Semantic index (LSI), principal component Analysis (Eigentaste).

(2) Mining Association rules. is a generic technique for identifying similar rule-relational patterns in large-scale transactions. The typical application of this technique is to discover pairs or groups of goods from the products that are often purchased at the same time in supermarkets. A typical rule is that "if a user buys a baby food, he or she will have a 70% chance of buying diapers." Knowing this relationship, you can use this knowledge to sell and cross-sell, or to design the layout of a store.

(3) The recommendation method based on probability analysis. Bayesian algorithm, clustering.

(4) Slope one forecast scheme. The recommended quality is different, and the complexity of the algorithm itself varies. The earliest memory-based algorithms are relatively straightforward in terms of implementation, but other methods have complicated preprocessing and modeling techniques. Although many methods can be used in a mathematical software library, but still require some depth of mathematical skills (the method used to calculate the noise, the algorithm must be optimized in particular the need for these skills), which may limit the actual use of these methods, especially for small-scale enterprises. The Slope one forecast scheme is a relatively simple recommendation technique, although it is very simple, but it can also get the appropriate recommended quality. The idea is simple, based on the author's so-called "popularity difference", which is the difference between the scores of items to the user.

(5) Google News personalized recommendation engine. Keywords: probabilistic latent semantic index (PLSI), Minhash, accompanying views. If we have a large set of data sets and the data changes frequently, it is possible to apply existing techniques and real-time recommendations in terms of algorithm, design, and parallelism. The pure memory-based method cannot be used directly, and the continuous updating of the model must be solved for the model-based amount method.


Discussion and summary


(1) Collaborative filtering is the most popular recommendation system today. The most important reason for its popularity is that there is a real-world environment as a benchmark for improvement, and it is very simple to analyze and generate recommended data structures (user item scoring matrices). Other algorithms will not be so simple, for example, the recommended application that interacts with the session will ask the user's preferences during the conversation and will incorporate some additional domain knowledge.

(2) from the point of view of use, the item-based collaborative filtering can handle very large-scale scoring data, and can produce good quality recommendations.

(3) Collaborative filtering cannot be applied in every field: for example, a car sales system without a history of purchase, or a system that requires more user preference details. Similarly, collaborative filtering technology requires that the user community be at a specific scale, which means that even in book and movie-picking rates, these technologies cannot be applied if there are not enough users or scoring data.

(4) Of course, the recommended method of overcoming these deficiencies is bound to pay a price, such as content-based recommendations that require further development and maintenance.

Recommendation System-Collaborative filtering recommendation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.