Programming collecive Intelligence Notes making recommendations

Source: Internet
Author: User

Currently, recommendation is a very common technology. when you shop online, Amazon will recommend products that you may be interested in. On movies and music websites, you will be recommended for music or movies you may like. Let's take a look at how these recommendations are implemented.

Collaborative Filtering

In daily life, the simplest way to get recommendations is to ask friends. You may know that some friends have a high taste and have a similar hobby. However, this method does not always work, because what a friend knows is very limited. I believe that everyone will be very entangled in not knowing where to go to dinner, or what products are worth buying.

At this time, we need a collaborative filtering algorithm, a collaborative filtering algorithm usually works by searching a large group of people and finding a smaller set with tastes similar to yours.

This expands the scope of your friends. When there are more people, there will be more natural information.

Collecting preferences

The first thing you need is a way to represent different people and their preferences.

As mentioned above, the collaborative filtering algorithm is used to identify people with close interests from many people. The first step is to show the interests of individuals and others so as to facilitate subsequent data processing.

The general practice is to treat everyone as a vector, and each feature point of interest is regarded as one-dimensional vector. All interests need to be quantified here, otherwise data cannot be computed and processed. for example, if you like it very much, mark the value 5, usually mark 3.

In python, this vector is represented in a dictionary, which is very convenient.

Critics = {

''Lisa rose '': {''lady in the water'': 2.5, ''snakes on a plane'': 3.5, ''just my luck '': 3.0, ''superman returns '': 3.5, ''you, me and Dupree'': 2.5, ''the Night Listener '': 3.0 },
''Gene Seur ur '': {''lady in the water'': 3.0, ''snakes on a plane'': 3.5, ''just my luck '': 1.5, ''superman returns '': 5.0, ''the Night Listener'': 3.0, ''you, me and Dupree '': 3.5}

}

The above shows how much Lisa and gene like each movie, respectively, expressed in numbers 1 to 5.

Finding similar users

In the above example, we use a vector to indicate the user who needs to perform collaborative filtering. The following question is how to find the user of similar.

Since we use vectors to represent users, we can find that the similar user is actually to calculate the shortest distance between vectors and find the closest vectors.

I'll show you two systems for calculating similarity scores:Euclidean distanceAndPearson correlation.

Euclidean distance

Euclidean distance is the absolute distance between two points.

> From math import SQRT
> SQRT (POW (5-4, 2) + POW (4-1, 2 ))
3.1622776601683795

The above code calculates the distance between (5, 4) AND (4, 1 ).

However, you need a function that gives higher values for people who are similar.

> 1/(1 + SQRT (POW (5-4, 2) + POW (4-1, 2 )))
0.2402530733520421

Pearson correlation

The Euclidean distance is relatively simple, but there is a problem. The scoring for a sample item is subjective, and the scoring standards for each person are different. Some people have a high score, while others have a low score, therefore, the absolute distance cannot be handled in this case.

Pearson correlation coefficient is used to calculate the ratio between dimensions of a vector. Two vectors have similar proportions.

For example, for vectors () and (), if the Euclidean distance is used to calculate the difference far, but the Pearson correlation coefficient is used to calculate the difference, the similarity is 1, completely similar.

There are using other functions such asJaccard CoefficientOrManhattan distanceThat you can use as your similarity function.

Ranking the critics
Now that you have functions for comparing two people, you can create a function that scores everyone against a given person and finds the closest matches.

Recommending items
Finding a good critic to read is great, but what I really want is a movie recommendation right now.

By calculating the distance between vectors, we can find the users closest to a user, but our goal is to recommend movies. What should we do next?

Now we have the following five similar user scores for the night, lady, and luck movies to see how to recommend movies.

Critic SimilarityNightS. xnightLadyS. xladyLuckS. xluck
Rose 0.99 3.0 2.97 2.5 2.48 3.0
Seymour 0.38 3.0 1.14 3.0 1.14 1.5 0.57
Puig 0.89 4.5 4.02 3.0 2.68
Lasalle 0.92 3.0 2.77 3.0 2.77 2.0 1.85
Donews 0.66 3.0 1.99 3.0 1.99
Total 12.89 8.38 8.07
Sim. Sum 3.84 2.95 3.18
Total/Sim. Sum 3.35 2.83 2.53

First, the movie rating * similarity gets a relative score, such as similarity * Night = S. xnight. The more similar the user, the higher the score weight.

Add the relative scores of all users on the movies to get the total score, and directly use the total score as the basis for recommendation. As a result, the more users score the movies, the more cost-effective, so we can get the total/Sim by using the similarity sum of all comment users. sum, which is used as the basis for recommendation.

Not only do you get a ranked list of movies, but you also get a guess at what my rating for each movie wocould be.

We have completed a recommendation system. We can replace users and movies with any other object to complete various recommendation systems.

Item-Based Filtering

The way the recommendation engine has been implemented so far requires the use of all the rankings from every user in order to create a dataset. this will probably work well for a few thousand people or items, but a very large site like Amazon has millions of MERs and products-comparing a user with every other user and then comparing every product each user has rated can be very slow.

The method we introduced above is no problem for small datasets, but for large datasets such as Amazon, it will be very slow, because you need to calculate the similarity of any two objects each time. this method is calledUser-based collaborative filtering. An alternative is knownItem-based collaborative filtering. In cases with very large datasets, item-based collaborative filtering can give better results, and it allows values of the calculations to be synchronized med in advance so that a user needing recommendations can get them more quickly.

Here, we assume that the recommendation system is used to recommend items for users. The above method is to first find a set of users similar to the user and then recommend items according to the user's favorite items.

In fact, there are similarity between items. If we calculate a set of items similar to each item in advance, when we recommend items for users, you only need to use the similar item set of the user's favorite item for recommendation.

One basis for doing so isComparisons between items will not change as often as comparisons between users

Because your users' interests may change constantly, the relationship between users is constantly changing, and the relationship between things is relatively stable, such as the relationship between two movies, is objective

So how can we calculate the similarity between items? Previously we calculated the similarity between users. We can put this matrix upside down and use item as the vector and user evaluation as the dimension to calculate the similarity between items.

At the beginning of this method, the similarity between items changes frequently, but when the user evaluation reaches a certain level, the similarity relationship becomes stable. in fact, you can also use other methods to calculate the similarity between items, such as for movies, you can calculate the similarity between movie introductions and movie reviews.

Then we get the similarity between items. How can we recommend them?

Assume that the user has commented on the snkes, Superman, and Dupree, then how can he recommend new movies based on his rating?

The similarity with other movies is listed below, assuming only night, lady, luck

Movie Rating night R. xnight Lady R. xlady luck R. xluck
Snail Kes 4.5 0.182 0.818 0.222 0.999 0.105 0.474
Superman 4.0 0.103 0.412 0.091 0.363 0.065 0.258
Dupree 1.0 0.148 0.148 0.4 0.4 0.182 0.182
Total 0.433 1.378 0.713 1.764 0.352 0.914
Normalized 3.183 2.598 2.473

The calculation method is as follows:

Rating * Night = R. xnight

Total-R.x/total-Night = normalized

User-based or item-based filtering?

Item-based filtering is significantly faster than user-based when getting a list of recommendations for a large dataset, but it does have the additional overhead of maintaining the item similarity table.

Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two perform about equally in dense datasets.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.