Among the many methods of recommender system, the user-based collaborative filtering recommendation algorithm was first born, and the principle was simpler. The algorithm was introduced in 1992 and used in the mail filtering system, two years later 1994 years Grouplens used for news filtering. Until 2000, the algorithm was the most famous algorithm in the field of recommendation systems.
This paper briefly introduces the idea and principle of user-based collaborative filtering algorithm, and finally, based on the algorithm, it realizes the recommendation of the park friends, that is, according to the people you are concerned about, recommend other people in the blog park that you might be interested in.
Basic ideas
As the saying goes, "Birds of a Feather, flock together", take a movie This example, if you like "Batman", "Mission Spy", "Star Crossing", "source code" and other movies, other people also like these movies, and he also likes "Iron Man", it is likely that you also like "Iron Man" this film.
So, when a user a needs personalized recommendation, you can first find and his interests similar to the user group G, and then the G-like, and a has not heard of the items recommended to a, which is based on the user's system filtering algorithm.
Principle
Based on the above principles, we can split the user-based collaborative filtering recommendation algorithm into two steps:
1. Find a user collection similar to the interest of the target user
2. Find items in this collection that the user likes, and the target user has not heard of, recommend to the target user
1. Discover users with similar interests
The similarity between two users is usually calculated using the Jaccard formula or the cosine similarity degree. Set N (u) for the user you like the collection of goods, N (v) for the user v favorite items set, then the similarity between U and V is how much:
Jaccard formula:
Cosine similarity:
Suppose there are currently 4 users: A, B, C, D; A total of 5 items: A, B, C, D, E. The relationship between the user and the item (the user likes the item) as shown:
How do you calculate the similarity between all users at once? For computing convenience, it is usually necessary to first create an "item-user" Inverted list, as shown in:
Then for each item, like his users, 22 the same item plus 1. For example, the user who likes item A has a and B, then they are 22 plus 1 in the matrix. As shown in the following:
Calculating the similarity between user 22, the matrix above represents only the molecular part of the formula. Taking the cosine similarity as an example, a further calculation is performed:
In this way, the calculation of user similarity is done, it can be very intuitive to find the interests of the target users more similar to the user.
2. Recommended items
First, we need to find out from the matrix the most similar to the target user U K users, with the set S (U, K), the s in the user's favorite items are all extracted, and remove the items you already like. For each candidate I, the extent to which the user U is interested is calculated using the following formula:
Where RVi indicates that user v likes the degree of I, in this case is 1, in some users need to give a rating of the recommendation system, then to the user ratings.
For example, suppose we want to give a recommended item, choose K = 3 Similar users, similar users are: B, C, D, then they liked and a did not like the items are: C, E, then calculate P (A, C) and P (A, E):
It seems that user A to C and E of the degree of liking may be the same, in the real recommendation system, as long as the order by the score, take the first few items can be.
Recommended by Park Friends
In the social network recommendation, "The item" is actually "the person", "likes an item" to "the person of concern", this section above the algorithm realization gives me to recommend 10 garden friends.
1. Calculate 10 friends who are most similar to my interests
because it is just for me to do user referrals, so there is no need to build a huge user 22 similarity between the matrix, and my interests similar to the park friends will only be generated in this group: I pay attention to the people's fans. In addition to myself, I am currently concerned about 23 friends, the 23 garden friends have a total of 22,936 unique fans, I have calculated the similarity for the 22,936 users, the similarity ranking of the top 10 users and similarity is as follows:
Nickname |
number of followers |
Common Quantity |
Similarity degree |
Blue Maple Leaf 1938 |
5 |
4 |
0.373001923296126 |
FBI080703 |
3 |
3 |
0.361157559257308 |
Fish non-Fish |
3 |
3 |
0.361157559257308 |
Lauce |
3 |
3 |
0.361157559257308 |
Blue snail |
3 |
3 |
0.361157559257308 |
Shanyujin |
3 |
3 |
0.361157559257308 |
Mr.huang |
6 |
4 |
0.340502612303499 |
Say hello to the world |
6 |
4 |
0.340502612303499 |
Strucoder |
28 |
8 |
0.31524416249564 |
Mr.vangogh |
4 |
3 |
0.312771621085612 |
2. Calculate the interest level of the recommended garden friends
A total of 25 friends were recommended by the 10 similar users, calculated to get the interest and sort:
Sort |
Nickname |
Degree of Interest |
1 |
Wolfy |
0.373001923296126 |
2 |
Artech |
0.340502612303499 |
3 |
Cat Chen |
0.340502612303499 |
4 |
Wxwinter (Winter) |
0.340502612303499 |
5 |
Danielwise |
0.340502612303499 |
6 |
Forward |
0.31524416249564 |
7 |
Liam Wang |
0.31524416249564 |
8 |
Usharei |
0.31524416249564 |
9 |
Coderzh |
0.31524416249564 |
10 |
Blog Park Team |
0.31524416249564 |
11 |
Dark blue right hand |
0.31524416249564 |
12 |
Kinglee |
0.31524416249564 |
13 |
Gnie |
0.31524416249564 |
14 |
Riccc |
0.31524416249564 |
15 |
Braincol |
0.31524416249564 |
16 |
The ticking of the rain |
0.31524416249564 |
17 |
Dennis Gao |
0.31524416249564 |
18 |
Liu Dong. NET |
0.31524416249564 |
19 |
Li Yongjing |
0.31524416249564 |
20 |
The dodo at the end of the wave |
0.31524416249564 |
21st |
Li Tao |
0.31524416249564 |
22 |
Ah |
0.31524416249564 |
23 |
Jk_rush |
0.31524416249564 |
24 |
Xiaotie |
0.31524416249564 |
25 |
Leepy |
0.312771621085612 |
Just need to take the similarity to the top 10, but it looks like the whole list of recommended quality is also good!
Reference
Xiangliang: Recommendation System Practice
This address: http://www.cnblogs.com/technology/p/4467895.html
Principle and implementation of user-based collaborative filtering recommendation algorithm