Recommendation System practices-user-based collaborative filtering algorithms-Algorithms

Source: Internet
Author: User

Recommendation System practices-user-based collaborative filtering algorithms-Algorithms

The neighborhood-based algorithm is the most basic algorithm in the recommendation system. It is not only deeply studied in academic circles, but also widely used in the industry. Neighbor-based algorithms are classified into two categories: user-based collaborative filtering algorithms and item-based collaborative filtering algorithms.

Let's take a look at the user-based collaborative filtering algorithm. The general idea of the item-based collaborative filtering algorithm is similar to that of the user-based collaborative filtering algorithm. You can refer to the comparison and learning method for yourself.

User-based collaborative filtering algorithms

At the beginning of every new semester, new students will always ask similar questions, such as "What kind of professional books should I buy?" and "What papers should I read. At this time, senior engineers generally give them some recommendations. This is an example of Personalized recommendations in reality. In this example, the younger brother may ask many younger brothers and then make the final judgment. The reason why the younger brother consulted the younger brother is that they have social relations, mutual understanding, and mutual trust. But the more important reason is that the younger brother and younger brother have a common research field and interest. Therefore, in an online personalized recommendation system, when A user A needs personalized recommendations, he or she can find other users who are similar to him, then, we recommend the items that users like and that user A has never heard of to user.This method is called a user-based collaborative filtering algorithm.

The user-based collaborative filtering algorithm consists of two steps.

(1) find a set of users with similar interests to the target user.

(2) Find the items that the users in this collection like and the target users have never heard of and recommend them to the target users.

The key to step (1) is to calculate the similarity between two users. Here, collaborative filtering algorithms use behavior similarity to calculate similarity of interest. Given the user u and user v, Let N (u) indicate the user u has a positive feedback set of items, so that N (v) for the user v has a positive feedback set of items. Then, we can use the following Jaccard formula to calculate the similarity between u and v interests or use the cosine formula:

Jaccard remainder formula:

:

 

 

We can calculate this behavior record based on the cosine formula as follows:

Taking cosine similarity as an example, we can use the following pseudocode to achieve this similarity:

def UserSimilarity(train):    W = dict()    for u in train.keys():        for v in train.keys():            if u == v:                continue            W[u][v] = len(train[u] & train[v])            W[u][v] = /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)    return W

The time complexity of this method is O (| U | * | U |), which is very time-consuming when the number of users is large. In fact, many users do not act on the same item, that is, N (u) ^ N (v) = 0. The algorithm above wastes a lot of time on calculating the similarity between users. For another idea, we can calculate N (u) ^ N (v) first )! = 0 user pair (u, v), and then divided by the denominator sqrt (N (u) * N (v )).

To this end, you can first create an item to a user's inverted table, and save a list of users who have performed behavior on the item for each item. Make the sparse matrix C [u] [v] = N (u) ^ N (v ). Assume that user u and user v belong to the user list corresponding to K items in the inverted table at the same time, C [u] [v] = K. In this way, you can scan the user list corresponding to each item in the inverted table, and add the C [u] [v] corresponding to the two users in the user list to 1, in the end, we can get C [u] [v] with no 0 between all users.

def UserSimilarity(train):    # build inverse table for item_users    item_users = dict()    for u, items in train.items():        for i in items.keys():            if i not in item_users:                item_users[i] = set()            item_users[i].add(u)    #calculate co-rated items between users    C = dict()    N = dict()    for i, users in item_users.items():        for u in users:            N[u] += 1            for v in users:                if u == v:                    continue                C[u][v] += 1    #calculate finial similarity matrix W    W = dict()    for u, related_users in C.items():        for v, cuv in related_users.items():            W[u][v] = cuv / math.sqrt(N[u] * N[v])    return W

 

The following is a sparse matrix created based on the idea. For item A, add W [A] [B] and w [B] [a] to 1. For item B, add W [A] [C] and w [C] [A] to 1, and so on. After scanning all the items, we can obtain the final W matrix, W is the molecular part of cosine similarity, and then W is divided by the denominator to obtain the final user interest similarity.

After obtaining the similarity between users, the UserCF algorithm will recommend the users with K items that are most similar to their interests. The formula on the right above measures the user u's interest in item I in the UserCF algorithm: S (u, K) contains K users closest to the user's u interest, N (I) is a set of users who have behaviors on item I. Wuv is the similarity between user u and user v. Rvi represents the interest of user v on item I, because the hidden feedback data of a single behavior is used, all Rvi = 1.

The following code implements the above UserCF recommendation algorithm:

def Recommend(user, train, W):    rank = dict()    interacted_items = train[user]    for v, wuv in sorted(W[u].items, key=itemgetter(1), reverse=True)[0:K]:        for i, rvi in train[v].items:        if i in interacted_items:            #we should filter items user interacted before            continue        rank[i] += wuv * rvi    return rank

Select K = 3. User A has no behavior on item c and item e. Therefore, you can recommend these two items to user. According to the UserCF algorithm, user A's interests in item c and item e are:

Improvement of user Similarity Calculation

If both users have bought Xinhua Dictionary, this does not mean that they have similar interests, because the vast majority of Chinese users have bought Xinhua Dictionary since childhood. However, if both users have bought "Introduction to data mining", they may think that their interests are similar, because only those who study data mining can buy this book. In other words, the similarity between the two users who have performed the same behavior on unpopular items can better indicate their interests. Therefore, John S. Breese proposed the following formula in thesis ① To calculate the similarity of users' interests based on user behavior:

The reciprocal of the numerator punishes the influence of popular items in the user's u and user v's shared interests list on their similarity.. N (I) is a set of users who have behavior on item I. The more popular the user is, the larger the N (I)

def UserSimilarity(train):    # build inverse table for item_users    item_users = dict()    for u, items in train.items():        for i in items.keys():            if i not in item_users:                item_users[i] = set()            item_users[i].add(u)    #calculate co-rated items between users    C = dict()    N = dict()    for i, users in item_users.items():        for u in users:            N[u] += 1            for v in users:                if u == v:                    continue            C[u][v] += 1 / math.log(1 + len(users))    #calculate finial similarity matrix W    W = dict()    for u, related_users in C.items():        for v, cuv in related_users.items():            W[u][v] = cuv / math.sqrt(N[u] * N[v])    return W

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.