"Reading Notes" recommendation system (Recommender systems An Introduction) Chapter II Collaborative filtering recommendations

Source: Internet
Author: User

Input: "User-item" Scoring matrix output: (1) The user's rating of an item's liking; (2) for users, n recommended list of items
1. User-based nearest neighbor recommendation (User-based CF)
The basic assumptions of the algorithm are: (1) If users have similar preferences in the past, they will have similar preferences in the future; (2) User preferences do not change over time
User similarity calculation: The Pearson correlation coefficient in user-based cf is better, and the cosine similarity ratio of item-based CF is better. The academic community has a more in-depth study of similarity, for example: in many areas there are items (popular items) that everyone likes, and it is more valuable for two users to agree on controversial items than to agree on popular items, Breese to suggest "anti-user Frequency" (IUF).
User nearest Neighbor Selection method: (1) fixed a certain similarity threshold, (2) fixed a number of neighbors threshold value
2. Based on item amount nearest neighbor recommendation (Item-based CF)
Algorithm basic hypothesis: The user likes an item, then also likes the item similar to this item
When the user volume is relatively large, the user-based CF needs to calculate the similar user, this computation cost is very big, moreover, when the data changes, the previously calculated user's similarity degree, also is not stable. Instead, item-based CF, which calculates the relationship between items, is more suitable for offline computing and calculates the similarity between objects, and when the data changes, the results are more stable.
When the calculation is too large, if you must deal with, you can take the "two sampling" method, that is, select only a subset of the data.
3. About ratings
The user-item matrix, which connects users and items, is the user's rating of the item. Can be divided into display scores and implicit scoring. The problem with displaying ratings is that the user needs extra pay. Collecting and displaying scores is not too difficult, but a small number of "early risers" provide some scoring (psychological basis). For some areas, such as personalized online radio, implicit feedback works better than real-world feedback.
If there is little or no scoring, it is a sparse data problem. Whether you can take advantage of some other information, such as: User's natural attributes. Can I have a default value for items that the user does not have behavior? Cold start problem is a special case of data sparse problem.
4. More Models and methods
Collaborative filtering recommendation can be divided into memory-based method and opportunity model. The former remembers all the data in the storage body. The latter, (offline) Do data dimensionality reduction, abstract features, run-time directly with the characteristics.
(1) matrix decomposition Method SVD (Basic), Lsa,lsi. Are all methods of dimensionality reduction. As for the calculation of the similarity between the item, the similarity between the user, the similarity measure choice and so on, and the classic CF is no different. Principal component Analysis method.
(2) Association rules Mining Beer diapers. In the popular movie field, the association rule Mining effect is better. To recommend the Web page to the user scene, association rules effect is also good.
(3) The method of probability analysis is used to turn the recommendation problem into a classification problem. For example, the user can score the item by 1--5, for new items, and for the current user, to assign items to five categories, respectively, corresponding to the 1–5 points. Personal feeling, the role of this method is to send articles for the academic community, it is difficult to use in practice.
5. Practical methods and systems in recent years
Slope one predicts Google Reader's predictive approach. The idea is very interesting, is to use the current user-related users (and the current user on an item has been scored-points high points low does not matter) on an item's rating, to predict the current user's rating of this item. Intuition does not necessarily make sense, even makes sense, is far less sense than cf. However, the algorithm is suitable for parallelization. And Google also uses map reduce to do it.
Google Reader is actually a hybrid recommendation system. This includes both off-line computing and online user behavior mining.
6. Discussion and summary
There is currently no way to find a recommendation that is excellent on all systems and datasets. Therefore, the recommendation system should also be "concrete analysis of specific problems", more attempts. Cf method requires the user to have a certain scale, too small scale, the effect is unpredictable.
Finish.

"Reading Notes" recommendation system (Recommender systems An Introduction) Chapter II Collaborative filtering recommendations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.