User-based collaborative filtering recommendation algorithm

Source: Internet
Author: User
Tags soap macbook

What is the recommended algorithm

Recommendation algorithm was first proposed in 1992, but the fire is actually the recent years of things, because of the outbreak of the Internet, with a larger amount of data can be used by us, the recommended algorithm has a great use.

At the beginning, so we find information on the Internet, are into the Yahoo, and then classify the points in, find what you want, this is a manual process, to later, we use Google, directly search for their own content, these can be more accurate to find what you want, but, What if I don't know what I'm looking for? The most typical example is if I open the watercress for a movie, or if I go to buy it, I don't really know what I want to buy or see, and that's when the referral system will come in handy.

Criteria for recommended algorithms

Recommended algorithm from 92 onwards, has developed to now also has 20 years, of course, also out of a variety of recommended algorithms, but no matter how, are not open a few conditions, this is the basic condition of the recommendation

    • Give you a recommendation based on someone you like.

    • Find out what you like and recommend it to you.

    • According to the keyword you give you recommend, this actually degenerate into a search algorithm

    • According to the above conditions combined to recommend you

In fact, the existing conditions on these, as to how to play these conditions is eight Immortals crossing recount, so many years to precipitate some good algorithm, today this article to talk about the user-based collaborative filtering algorithm is one of them, which is also the first recommended algorithm, and developed to today, the basic idea of nothing change, There are some differences in the algorithm for calculating similarity in the processing speed.

User-based collaborative filtering algorithm

Let's do a lexical analysis to 基于用户 show that this algorithm is a user-oriented algorithm, this user-oriented algorithm is more emphasis on social attributes, that is, this kind of algorithm more emphasis on and you have similar hobbies of other users of the items recommended to you, It corresponds to an item-based recommendation algorithm, which puts more emphasis on recommending items similar to your favorite items .

Then 协同过滤 it is, the so-called 协同 is to help you together, and then follow a, that is 过滤 , we have to discuss the results to tell you, otherwise the information is too big.

So, together, it's an algorithm that you can discuss with your little friends who have similar hobbies, and then tell you what you'd like.

Algorithm descriptionSimilarity calculation

We try not to use complex mathematical formulas, one is afraid that we do not understand, difficult to understand, and the other is that I am blogging with a Mac, the formula is not good painting, too troublesome.

The so-called computational similarity, there are two more classical algorithms

    • Jaccard algorithm, is the intersection divided by the set, detailed can look at this article.

    • Cosine distance similarity algorithm, the algorithm is widely used to calculate the similarity between vectors, the specific formula everyone Google, or see here

    • Various other algorithms, such as the Euclidean distance algorithm, and so on.

Regardless of whether you use Jaccard or the cosine algorithm, what you need to do essentially is to find the similarity between the two vectors, which depends entirely on the actual situation.

We use cosine distance similarity in this article to calculate the similarity between two users.

K users closest to the target user

We know that when looking for your hobby similar to the small partner, we may find hundreds of, but some are good friends, but some are just ordinary friend, then the general, we will set a number k, and your most similar K small partner is your good friends, their hobbies and your hobbies may not be the same, It's best to have them recommend something to you (like soap).

What is similar to you? Simply put, like you like macbook,iphone,ipad , a small partner like, macbook,iphone,note2,小米盒子,肥皂,蜡烛 B small partners like, macbook,iphone,ipad,肥皂,润肤霜 C goddess like 雅诗兰黛,SK2,香奈儿 , D cock Silk like ipad,诺基亚8250,小霸王学习机 so obviously, B small partners and you more similar, and C goddess completely and you are not a grade, then we recommend the time will be 肥皂recommended for you, because we think soap may be best for you.

So, how to find the K-base friends? The most straightforward way is to compare the target user and all the users in the database, to find the most similar to the target users of the K-user, this is a good base friends.

This is not a problem in theory, but when the amount of data is huge, the time to calculate the K-base friends will be very long, and you think you know, most of the users in the database and you do not have any intersection, there is no need to calculate all users, just to calculate and you have the intersection of users on the line.

To calculate and you have the intersection of users, it is necessary to use 物品到用户的反查表 , what is a counter-check table? Very simple, or is the above AB small partner and C goddess of the example, counter-check table is like the MacBook has 你,A,B , like the iphone has 你,B ... Is like some of the objects of the user, with this table, we can see that you have a relationship with the users only a and b,d, and the C goddess and you do not have any intersection, so do not have to think of c.

So, we have a and b,d, and then we calculate the similarity between a and b,d and you, regardless of the similarity formula, we calculate that B is more similar to you (in this case, it is generally calculated with Jaccard because these vectors are not particularly good Yu Yinghua), but if our Kset to 2, then we can conclude that the closest friend to you is B and a.

This is the calculation of the K-users closest to the target user.

To recommend a product through this K-user

Well, your good friends we have to work out, next to you to recommend products. However, there are four kinds of products we can recommend 小米盒子,note2,蜡烛,润肤霜,肥皂 , which is the one you need? The algorithm here is more extensive, we can not sort, all a brain recommended to you, but this obviously may be some of you are not interested, we can also do some processing, if we calculate a and your similarity is 25%,b and your similarity is 80%, then for the above products, our recommendation can be calculated

    • Millet box: 1*0.25 = 0.25

    • note2:1*0.25 = 0.25

    • Candle: 1*0.25 = 0.25

    • Moisturizer: 1*0.8 = 0.8

    • Soap: 1*0.8+1*0.25=1.05

This is clear, obviously, we will first recommend to you soap, this may be the most you need, followed by a moisturizer, then the candle, millet box and Note2.

Of course, you can make the above results or in other ways you feel appropriate to calculate the recommendation, no matter how, the recommendation or the base friend and you have a relationship with the similarity, that is, the 0.8 and 0.25 must be used, or the front white forget.

Algorithm Summary

Well, by this example, you probably know why you recommend soap to you, which is the description of the user-based collaborative recommendation algorithm, which is summed up in a few steps.

    1. Calculate the similarity of other users and you can use the contrast table to ignore a subset of users

    2. Find K's closest neighbor to you, depending on the degree of similarity.

    3. In what these neighbors like, according to the proximity of neighbors and you to calculate the recommended level of each item

    4. Recommend items according to the recommended level of each item.

For example above, first, we ignore the C goddess by the counter-check table, and then calculate a and b,d with your similarity, and then according to K=2 to find the most similar neighbors A and B, and then according to a B and your similarity to calculate the recommendation of each item and sort, and finally according to the recommended degree of order to recommend products.

How about, is it very simple ah.

Problems with algorithms

This algorithm is relatively simple to implement, but it can sometimes be problematic in practical applications.

For example, some very popular products may be liked by many people, this product is not recommended to you, so the calculation of the need to add a weight to the product or to completely remove the product is OK.

Again, for some common things, such as buying books when the reference books, such as Modern Chinese dictionary, Xinhua Dictionary god Horse, the versatility is too strong, the recommendation is not necessary.

These are the recommended system of dirty data, how to get rid of dirty data, this is the data preprocessing time things, here is not much to say.



User-based collaborative filtering recommendation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.