Notes on recommendation System Practice (i)

Source: Internet
Author: User

1. Recommended types

A) Friends recommend, this is called social referral, through the social network to fix. For example, your girlfriend like snacks, recommended to you through Renren, you do not want to buy is not.

b) Recommend items that are consistent with a property of your liking, which is called content-based recommendations. For example, I like Ning Hao movie, before in the watercress likes crazy stone, and then soon watercress began to recommend to me "No land", I look, than the stone is fine---well, entrainment sihuo.

c) The sum of people and people like to consider, either recommend and my similar user favorite items, or recommend and my favorite items similar to the items, this is called collaborative filtering

2. Collaborative filtering 2.1 user-based collaborative filtering algorithm (USERCF)

The user-based collaborative filtering algorithm mainly consists of two steps.

(1) Find a user collection similar to the interest of the target user.

(2) Find items in this collection that the user likes, and the target user has not heard of the item recommended to the target user.

For example, the following user behavior records


How to calculate user interest similarity? Cosine similarity formula:


For example, user A has behavior on items {A, B, d}, User B has behavior on item {A, C}, and calculates the interest similarity of User A and User B using the cosine similarity formula:



This is a simplified version of the cosine similarity, and the reality is far from simple.

Here's an article on cosine similarity: http://www.ruanyifeng.com/blog/2013/03/cosine_similarity.html

If you want to compute a cosine similarity, the complexity is too, in order to simplify the operation, you can create an object-user inverted table:


Thus, it is easy to calculate

WAB=1/√ (3*2),
WAC=1/√ (3*2)
WAD=1/√ (3*3)

After getting the interest similarity, the USERCF algorithm will give the user the most similar interest in the K user's favorite items. The following formula measures the degree of interest of user U in item i in the USERCF algorithm:


where S (U, k) contains the K-users closest to user U, N (i) is a collection of users who have acted on item I,Wuv is the interest similarity of user U and V,rVI On behalf of User v interest in item I. In the absence of a scoring system, interest means buying or not buying, which is 1.

Example:

Calculate user A's interest in item C and take k=3. So, S (U, K) is B,c,d;n (c) is a set of users who have had a relationship with C, that is, B and D, then


2.2 Object-based collaborative filtering algorithm

The object-based collaborative filtering algorithm is divided into two main steps.

(1) Calculate the similarity between items.

(2) Generate a list of recommendations based on the similarity of the items and the items previously purchased by the user.

How do I calculate the similarity of an item? It can be thought that if the user who likes an item mostly likes another item, the two items can be considered similar. For example, the story of beer and diapers, these two things do not seem to matter, but maybe they are actually similar, the following into the small story:

In a supermarket, a particularly interesting phenomenon has been found: diapers and beer are two unrelated items that are actually put together. But this bizarre move has dramatically increased the sales of diapers and beer. It's not a joke, but a real case of a Wal-Mart supermarket chain that has been talked about by businesses. It turns out that women in the United States usually take care of their children at home, so they often instruct their husbands to buy diapers for their children on their way home from work, while husbands buy diapers while buying their favorite beers. The discovery has brought a lot of profits to the business .


The similarity between the item I and j is expressed by a formula:


where n (i) refers to the list of users who buy item I, and | N (i) | refers to the length of this list.

Also to simplify the calculation, list the user-item inverted list.

ITEMCF calculates the user U's interest in an item J by the following formula:


Here is the first list of items J similar to the K-items, if the item U bought, then the superposition of similarity. can also be considered as 1.

Here is an example


Users have bought the "C + + Primer Chinese Version" and "The beauty of Programming" two books. Itemcf then calculates the user's interest in the introduction to the algorithm. The similarity between this book and the Chinese version of C + + primer is 0.4, and the similarity between this book and the beauty of programming is 0.5. Considering the user's interest in the C + + primer Chinese version is 1.3, the interest degree of "programming beauty" is 0.9, then the user's interest in the introduction of the algorithm is 1.3x0.4 + 0.9x0.5 = 0.97.

2.3 Graph-based recommendation algorithm

The

set g= (v,e) is an no-show graph if the vertex v (A, b) i j i j (I in A,j in B) g user behavior is easily represented by a binary graph, so many graph algorithms can be used in recommender systems.


If the personalized recommendation algorithm is placed on a binary graph model, the task of recommending items to the user can be transformed into a measure of the relevance of the user vertex (such as a) and the item node (e) that is directly connected to the node without its edge.

There are many ways to correlate the two vertices in a measure chart, but in general the correlation of vertices in the graph depends primarily on the following 3 factors:

1. The number of paths between points;

2. The length of the path between two vertices;

3. The vertices of the path passing between the two vertices.

A pair of vertices with high correlation typically has the following characteristics:

1. There are many paths connected between points;

2. The path length between the two vertices is relatively short;

3. The path between the connecting two vertices does not pass through a large-scale vertex.

To give a simple example, as shown, user A and item C, e are not connected, but user A and item C have two length 3 path connected, user A and item e have two length 3 path connected. Then, the correlation between vertex A and e is higher than vertex A and C, so item E should be in the list of user A's recommendation before item C, because there are two paths between vertices a and e-(a, B, C, E) and (A, D, D, E). Among them, (A, B, C, E) The path through the vertex of the degree is (3, 2, 2, 2), and (A, D, D, e) path through the vertex of the degree of (3, 2, 3, 2). Thus, (A, D, D, E) passes through a larger vertex D, so (A, D, D, e) contributes less than (a, B, C, E) to the correlation between vertices a and e.

Based on the above three elements, the researchers devised a number of algorithms, such as the Personalrank algorithm, which is very similar to the PageRank algorithm, based on the random walk. About the PageRank algorithm, I have another blog that introduces its principles

http://blog.csdn.net/ffmpeg4976/article/details/44540313

See below Personalrank:

If you want to give the user a line personalization recommendation, you can start with User a node in the user item binary map

Machine Walk. As you move to any node, you first follow probability A to continue the walk, or stop the walk and start again from the a node. If you decide to continue the walk, select a node randomly from the node pointed to by the current node as the next pass-through node of the walk. Thus, after many iterations, the probability that each node is accessed will converge to a number, including the item node. We follow this probability to sort the item nodes and recommend them.

Formula


The alpha in the formula above is a, don't confuse.

Like what:


The access probability for each vertex is essentially convergent after 9 iterations. In this example, user A does not have behavior with item B or D. In the final iteration results, the access probability of D is greater than B, so the preference for a recommended item is D.

Notes on recommendation System Practice (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.