Principle and implementation of user-based collaborative filtering recommendation algorithm

Source: Internet
Author: User

Among the many methods of recommender system, the user-based collaborative filtering recommendation algorithm was first born, and the principle was simpler. The algorithm was introduced in 1992 and used in the mail filtering system, two years later 1994 years Grouplens used for news filtering. Until 2000, the algorithm was the most famous algorithm in the field of recommendation systems.

This paper briefly introduces the idea and principle of user-based collaborative filtering algorithm, and finally, based on the algorithm, it realizes the recommendation of the park friends, that is, according to the people you are concerned about, recommend other people in the blog park that you might be interested in.

Basic ideas

As the saying goes, "Birds of a Feather, flock together", take a movie This example, if you like "Batman", "Mission Spy", "Star Crossing", "source code" and other movies, other people also like these movies, and he also likes "Iron Man", it is likely that you also like "Iron Man" this film.

So, when a user a needs personalized recommendation, you can first find and his interests similar to the user group G, and then the G-like, and a has not heard of the items recommended to a, which is based on the user's system filtering algorithm.

Principle

Based on the above principles, we can split the user-based collaborative filtering recommendation algorithm into two steps:

1. Find a user collection similar to the interest of the target user

2. Find items in this collection that the user likes, and the target user has not heard of, recommend to the target user

1. Discover users with similar interests

The similarity between two users is usually calculated using the Jaccard formula or the cosine similarity degree. Set N (u) for the user you like the collection of goods, N (v) for the user v favorite items set, then the similarity between U and V is how much:

Jaccard formula:

Cosine similarity:

Suppose there are currently 4 users: A, B, C, D; A total of 5 items: A, B, C, D, E. The relationship between the user and the item (the user likes the item) as shown:

How do you calculate the similarity between all users at once? For computing convenience, it is usually necessary to first create an "item-user" Inverted list, as shown in:

Then for each item, like his users, 22 the same item plus 1. For example, the user who likes item A has a and B, then they are 22 plus 1 in the matrix. As shown in the following:

Calculating the similarity between user 22, the matrix above represents only the molecular part of the formula. Taking the cosine similarity as an example, a further calculation is performed:

In this way, the calculation of user similarity is done, it can be very intuitive to find the interests of the target users more similar to the user.

2. Recommended items

First, we need to find out from the matrix the most similar to the target user U K users, with the set S (U, K), the s in the user's favorite items are all extracted, and remove the items you already like. For each candidate I, the extent to which the user U is interested is calculated using the following formula:

Where RVi indicates that user v likes the degree of I, in this case is 1, in some users need to give a rating of the recommendation system, then to the user ratings.

For example, suppose we want to give a recommended item, choose K = 3 Similar users, similar users are: B, C, D, then they liked and a did not like the items are: C, E, then calculate P (A, C) and P (A, E):

It seems that user A to C and E of the degree of liking may be the same, in the real recommendation system, as long as the order by the score, take the first few items can be.

Recommended by Park Friends

In the social network recommendation, "The item" is actually "the person", "likes an item" to "the person of concern", this section above the algorithm realization gives me to recommend 10 garden friends.

1. Calculate 10 friends who are most similar to my interests

      because it is just for me to do user referrals, so there is no need to build a huge user 22 similarity between the matrix, and my interests similar to the park friends will only be generated in this group: I pay attention to the people's fans. In addition to myself, I am currently concerned about 23 friends, the 23 garden friends have a total of 22,936 unique fans, I have calculated the similarity for the 22,936 users, the similarity ranking of the top 10 users and similarity is as follows:

Nickname number of followers Common Quantity Similarity degree
Blue Maple Leaf 1938 5 4 0.373001923296126
FBI080703 3 3 0.361157559257308
Fish non-Fish 3 3 0.361157559257308
Lauce 3 3 0.361157559257308
Blue snail 3 3 0.361157559257308
Shanyujin 3 3 0.361157559257308
Mr.huang 6 4 0.340502612303499
Say hello to the world 6 4 0.340502612303499
Strucoder 28 8 0.31524416249564
Mr.vangogh 4 3 0.312771621085612
2. Calculate the interest level of the recommended garden friends

A total of 25 friends were recommended by the 10 similar users, calculated to get the interest and sort:

Sort Nickname Degree of Interest
1 Wolfy 0.373001923296126
2 Artech 0.340502612303499
3 Cat Chen 0.340502612303499
4 Wxwinter (Winter) 0.340502612303499
5 Danielwise 0.340502612303499
6 Forward 0.31524416249564
7 Liam Wang 0.31524416249564
8 Usharei 0.31524416249564
9 Coderzh 0.31524416249564
10 Blog Park Team 0.31524416249564
11 Dark blue right hand 0.31524416249564
12 Kinglee 0.31524416249564
13 Gnie 0.31524416249564
14 Riccc 0.31524416249564
15 Braincol 0.31524416249564
16 The ticking of the rain 0.31524416249564
17 Dennis Gao 0.31524416249564
18 Liu Dong. NET 0.31524416249564
19 Li Yongjing 0.31524416249564
20 The dodo at the end of the wave 0.31524416249564
21st Li Tao 0.31524416249564
22 Ah 0.31524416249564
23 Jk_rush 0.31524416249564
24 Xiaotie 0.31524416249564
25 Leepy 0.312771621085612

Just need to take the similarity to the top 10, but it looks like the whole list of recommended quality is also good!

Reference

Xiangliang: Recommendation System Practice

This address: http://www.cnblogs.com/technology/p/4467895.html

Principle and implementation of user-based collaborative filtering recommendation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.