Principle and example of collaborative filtering based on user and project

Source: Internet
Author: User

1. User-based collaborative filtering

The user (user-based)-based collaborative filtering algorithm first looks for other users who are similar to the new user based on the user's historical behavior information, and predicts the items that the current new user might like based on the evaluation information of the other items by these similar users. Given the user scoring data matrix R, the user-based collaborative filtering algorithm needs to define the similarity function s:uxu→r to calculate the similarity between users, and then calculate the recommended results based on the scoring data and the similarity matrix.

In collaborative filtering, an important link is how to choose the appropriate similarity calculation method, the two commonly used similarity calculation methods include Pearson correlation coefficient and cosine similarity. The calculation formula for Pearson's correlation coefficients is as follows:

Where I represents an item, such as a commodity; IU represents the set of items that user U evaluates, IV represents the set of items for user v evaluation, Ru,i represents the user's score for item I, rv,i represents the user V's rating of item I, represents the average score for user v.

In addition, the cosine similarity calculation formula is as follows:

Another important step is to calculate the user U's forecast score for the outstanding items. First, based on the similarity calculation in the previous step, look for the neighbor set n∈u of user U, where n represents the neighbor set and U represents the user set. Then, in combination with the user scoring dataset, predict user U's scoring of item I, the formula is as follows:

where S (U, U ') represents the similarity of user u and user u '.

Suppose there is an e-commerce scoring dataset that predicts user C's rating for item 4, as shown in table 3-6.

Table 3-6 e-commerce website user ratings data set

In the table? Indicates that the rating is unknown. Based on the user-based collaborative filtering algorithm step, calculate user C's rating for item 4, as shown in the steps below.

(1) Find the neighbor of User C

As you can see from the data set, only user A and user D are overly good at item 4, so there are only 2 candidate neighbors, user A and User D, respectively. User A has an average rating of 4, User C has an average rating of 3.667, and User D has a average rating of 3. According to the Pearson correlation coefficient formula, the similarity of user C and user A is:

Similarly, S (C, D) =-0.515.

(2) Predict User C's rating for item 4

Based on the above scoring prediction formula, the user C rating for Item 4 is calculated as follows:


And so on, you can calculate other unknown scores.

2. Project-based collaborative filtering

The collaborative filtering algorithm based on the project (item-based) is another common algorithm. Unlike the user-based collaborative filtering algorithm, the item-based collaborative filtering algorithm calculates the similarity between Item to predict user ratings. This means that the algorithm can pre-calculate the similarity between the item, which can improve performance. The item-based collaborative filtering algorithm is used to predict the target item by the user scoring data and the calculated item similarity matrix.

Similar to the user-based collaborative filtering algorithm, the similarity between item needs to be calculated first. Moreover, the method of calculating similarity can also use Pearson relation coefficient or cosine similarity, here gives an electronic commerce system common similarity computation method, namely calculates the similarity degree between item based on conditional probability, the formula is as follows:


wherein, S (i, j) represents the similarity between the item I and J, Freq (IJ) represents the frequency of the common occurrence of I and J, Freq (i) indicates the frequency of the occurrence of I, Freq (j) represents the frequency of J appearance, and the resistance factor, which is mainly used to balance control of popular and popular item, For example, e-commerce in the hot goods and so on.

Next, based on the similarity matrix between the item calculated above, the unknown score is predicted based on the user's score. The prediction formula is as follows:


Wherein, PU, I represents the user u to the item I's prediction score, s represents and item I similar itemsets, S (i, j) represents the similarity between the item I and J; Ru, J represents the user U's rating for item J.

Principle and example of collaborative filtering based on user and project

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.