Suppose the data is as follows, where the row represents the user, and the column represents the rating item:
Let's look at the three formulas first.
Cosine similarity (cosine-based similarity):
Pearson coefficient (Pearson correlation):
Fixed
To sort out the recent Pearson similarity calculation in the collaborative filtering recommendation algorithm, incidentally learning the simple use of the next R language, and reviewing the knowledge of probability statistics.
I. Theory of
To sort out the recent Pearson similarity calculation in the collaborative filtering recommendation algorithm, incidentally learning the simple use of the next R language, and reviewing the knowledge of probability statistics.I. Theory of
Both the user CF and the item CF rely on the similarity calculation, because only by measuring the similarity between the user or the item can the user's "neighbor" be found to complete the recommendation. The calculation of similarity is briefly
1. Pearson Pearson correlation coefficientPearson's correlation coefficient is also known as Pearson's correlation coefficient, which is used to reflect the statistical similarity between the two variables. Or to represent the similarity of two
In reality, the recommended systems are generally based on the collaborative filtering algorithm, such algorithms usually need to calculate the user and user or project and project similarity, for data and data types of different data sources, need
In reality, the recommendation system is generally based on collaborative filtering algorithms, which usually need to calculate the user and user or project and project similarity, for data volume and data types of different data sources, need
There are many similarity implementations in the Mahout recommendation system that implement calculations that do not have a similarity between user or item. For data sources with different data volumes and data types, different similarity
The correlation coefficient of measurement correlation is many, the calculation method and characteristics of various parameters are different.Related indicators for continuous variables:At this time, the correlation coefficient of product
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.