A simple content-based recommendation algorithm

Source: Internet
Author: User

Recently idle down and began to toss the recommendation system, a statement, this article is only to introduce the most basic content-based recommendation system (content-based Recommender Systems) Work principle, in fact, based on the content of the recommendation system is also divided into ranked Orz, This is simply a little less primitive, most basic work flow.

The idea of a content-based recommendation algorithm is simple, and its principle is roughly divided into 3 steps:

1. Build a property profile for each item (item)

2. Build a user's preferences profile for each user

3, the calculation of user preferences and property data similarity, high similarity means that users may like this item, low similarity often means that users do not like this item.

Select a user you want to recommend "U", for user U traverse the collection of items, calculate the similarity between each item and user u, select the highest similarity k items, recommend them to the user u--is done!

The item Profiles and user Profiles are described in detail below.

1. Item Profiles

when it comes to content-based recommender systems, you have to mention "Item Profiles", which is the entire system one of the most critical content, here's the " Item "refers to the recommended item, and" Item Profiles "refers to the detailed attribute of the recommended item.

What is "Item"? : For example, in a movie recommendation system, the goal is to recommend to users what they might like: The movie is equivalent to "Item";

What is "Item Profiles"? Actors List, Director list, film type, duration, Release date, box office, etc., the properties of these films are the so-called "Item Profiles".

2. Representing Item Profiles

For example, for the sake of simplicity, let's assume that "Item Profiles" contains only the list of actors. In this way, the "matrix" item profiles can be said: {Keanu Reeves, Laurence Fishburne, Kelly Ann Moss, Hugo • (for convenience, we assume that the "matrix" is the only 4 actors). However, this natural description of the item profiles can not be used directly in the code AH, we need is dry! So it is also necessary to map the nature description of item profiles to the data structure that the program can read, so a mapping is needed--to convert the item profiles of the natural language description into a 0, 1 matrix, as follows:

2.1, first constructs a 1xn dimensional matrix, n represents the world main movie star number. Initialize the 1XN matrix and set all the elements to 0 so that we get a line vector like this:[0,0,0,0,0,0,............................, 0], There are a total of n 0.

2.2, make a hypothesis: we assume that the No. 0 element of this line vector represents Jackie Chan, the 1th element represents Keanu Reeves, the 2nd element represents the Laurence Fishburne, the 3rd element represents Tom Cruise, 4th, 5 elements representing Kelly Ann Moss and Hugo, the remaining representatives who do not care, Let him go!

2.3, the natural language description of the item profiles map to the 1XN matrix, the mapping method is very intuitive, if the film m in the actor A1,a2 and A3, then m of the line vector, A1, A2, A3 corresponding elements are set to 1, indicating that the film m in the presence of the actor A1, A2 , A3.

For example, according to the assumptions in the 2nd step, the 0,1 matrix of the film "Hacker Empire" is [0,1,1,0,1,1,0,0,0,0,0,0...............,0]. It is not difficult to see that because the No. 0 element of the matrix represents Jackie Chan, and the Dragon is not an actor in the Matrix, so the No. 0 element in the matrices is 0, which means that Jackie Chan is not an actor in the Matrix, and, as a result of the 1th element of the matrices, represents Keanu Reeves, and Keanu Reeves is an actor in the Hacker Empire. So the 1th element in the matrix is 1, which means that Keanu Reeves is an actor in the "Matrix". In the same vein, the 2,4,5 element in the matrix is 1, representing Laurence Fishburne, Kelly Ann Moss, Hugo and Victor as actors in the "Matrix". The remaining elements are all 0.

3. User Profiles

so far, we've modeled item, which is the "Item Profiles", which is the 0,1 matrix of the 1XN dimension. But this is not enough, we also need to model the user, so-called model for the user, is the construction of "user Profiles", and this "user Profiles" is equivalent to users ' preferences. In This example of the film recommendation system, user U preferences can be expressed as a degree of preference for individual actors, for example:

Let's say we have a scoring matrix that contains 2 users and 3 movies:

Users \ Movies "Peak Moment" Bronx "The Matrix"
Alice 4 5 3
Bob 1 4

The meaning of the matrix is:

User Alice scored 4, 5, and 3 points (5 points) on the "Bronx" "The Matrix" in the peak moment

User Bob scored 1, 1, and 4 points for the "Bronx" "The Matrix" (5 points), with a blank table indicating that Bob had not yet scored the movie.

After analysis, it can be found that Alice is more like "Rush Hour" and "Bronx", and Jackie is the two movies in common, and we naturally guess: Alice may like Jackie Chan's film! With this message, you can start building her "User Profiles" for Alice by doing the following:

3.1. Figure out the average of all Alice's scores, in this example Alice's average avg = (4+5+3)/2 = 4

3.2. Use the formula: Figure out how much Alice likes to Jackie Chan. Where Xi is all about Jackie Chan's, and Alice's over-rated films, Avg is the average score of 3.1, N is all about Jackie Chan's, and is Alice's excessive number of films. In this example, the formula should be equal to ((4-4) + (5-4))/2 = 0.5, that is, Alice's preference for Jackie can be reflected by the value of 0.5.

3.3, similar to item Profiles,user Profiles also uses a 1xn matrix, unlike the item Profiles matrix, the element of the matrix in User Profiles is no longer 0, 1, but the degree of preference for each actor calculated from 3.2, so the final Alice Matrix can be expressed as [0.5,x,y,z,........... Xx,oo], recall that In 2.2 We have made the assumption that the No. 0 element of the matrix represents Jackie Chan, so the No. 0 element here is 0.5, which indicates that Alice's preference for Jackie Chan is 0.5. Similarly, you can figure out how much Alice likes other actors.

4. Calculation recommendation basis

The cosine similarity formula is used to calculate the distance between a given user "U" and the given item "I". The greater the value of the cosine similarity indicates that you are more likely to like I.

The cosine similarity method is calculated as follows:

In the case of the film recommendation system:

UA represents the user U's preference value for actor A (that is, actor a corresponds to the value in the User Profiles matrix)

IA Indicates whether the movie I contains actor a (that is, the value of actor A in the item profiles matrix)

5, start recommending!

We can walk through the entire movie library as described in 4, calculate Alice's similarity to each movie, choose the top K movies with the highest similarity, and recommend it to Alice!

A simple content-based recommendation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.