Data mining algorithm cultivation-collaborative filtering Collaborative Filtering

Source: Internet
Author: User
Collaborative Filtering from the outside

It is increasingly difficult to find useful information on the Internet, which leads to three methods: information retrieval, Information Filtering and recommendation systems. Information Retrieval refers to a search engine like Google and Baidu, which is a passive method. Information Filtering refers to first classifying information and then filtering based on user preferences, for example, when registering zhihu/Douban/Weibo, we will require you to select a domain of interest and then push the content of the selected domain; the recommendation system recommends information and products of interest to users based on their interests and purchasing behaviors, recommendation methods include content-based recommendation, model-based recommendation, association rules, and collaborative filtering.

 

Core Idea of collaborative filtering from the internal perspective

Two assumptions about the collaborative filtering algorithm:

1. Users generally like items similar to those of other users.

2. Users generally like items similar to their favorite items.

 

Two classic algorithms 1. User-based cf

The core idea of user-based collaborative filtering is to divide the recommendation into two steps:

Step 1: find people with similar taste and preferences;

Step 2: sort their favorite items into a sorting list and recommend them to you.

Inputs:

L user-item matrix # m x n

L similarity = "euclidean_distance", "cosine", "Pearson", "adjust_cosine" # Similarity Calculation Method

L K or threshold # Fix-size neighborhoods or threshold neighborhoods, or use the optimization algorithm to automatically select

L type = "predict" or "recommend" # prediction score or recommendation Product

L top-N # top-n products recommended

L uid # recommended user ID

Outputs:

L type = "recommend", top-N items recommended for uid

L type = "predict", user UID's prediction of item rating

 

Setp1: calculate similarity (I, UID)

Here, only the similarity between a specific user UID and other M-1 users is calculated. You can also calculate the similarity between m users-> mxm similarity matrix, and then take the UID row.

Similarity calculation is divided into two types:

Calculate the similarity between user I and uid ON THE all_items set

Calculate the similarity between user I and uid on the items set that have been scored together (User-item cannot be a 0-1 matrix and is mostly used in scoring prediction)

Output1: This step produces a vector or list (1 X M-1)

 

Step 2: K-Nearest-neighborhoods

L fixed K: sort in descending order, take the first K UIDs

L threshold: UID with Similarity greater than or equal to Threshold

Note: cross-validation can be used for selection of K and threshold.

Output2: In this step, we get a verctor (1 x F)

 

Step 3: predict or recommend

In the previous step, we obtained k users that are most similar to the current user uid. The historical purchase records of these k users are recorded as knnset_ I (I =,..., K ).

Type = "recommend"

The recommendation mainly targets user_item's 0-1 matrix, that is, to recommend items that are liked by neighbors and not purchased by the current user to the user. There are two methods to achieve this,

The first one is relatively simple, and similarity test is not performed. The idea is as follows: recommend the items most frequently purchased by K nearest neighbors to the current user. Perform the following operations:

Extract knnset_ I from the original user-item matrix to obtain KNN-user-item matrix (k x n)-> obtain the purchase frequency FV (1 x N) of N items by column sum) -> remove items that have been purchased by the current user-> sort by frequency in descending order, take the first n items for recommendation;

The second approach is the same as the first approach, except that similarity is considered for the purchase frequency calculation, that is, the weighted purchase frequency. Before summation by columns, multiply each row of KNN-user-item matrix by the corresponding similarity. The other steps are the same.

Type = "predict"

Prediction is used to predict the rating of an item (movie, music, or article) based on user_item. The idea is to calculate the weighted average score.

(After obtaining the prediction score, we can further sort P (UID, I) in descending order, and take the first N for recommendation. At this time, type = 'recommend ')

 

2. item-based cf

The core idea of the item-based collaborative filtering algorithm is to divide the recommendation into two steps:

Step 1: find items similar to those you have purchased

Step 2: sort these items into a sorting list and recommend them to you.

User_item matrix is a 0-1 matrix (amazom item-item)

Step 1: calculate similarity (I1, I2)

  • To save computing resources, first find commodity pairs with common customers and exclude most goods without any intersection.

For each item in product catalog, I1

For each customer C who purchased I1

For each item I2 purchased bycustomer C

Record that a customer purchased I1And I2

For each item I2
Compute the similarity between I1 and I2

  • Similarity Calculation

All users participate in computing, not just those who have bought I1 and I2 (in this case, such computing is meaningless ).

 

Step 2: recommend

Assume that the user uid has recently purchased the item I-> use the item-similarity matrix to find the K items that are most similar to the item I-> remove the purchased items-> take the first N recommendations

 

User_item matrix is the rating matrix.

Step 1: calculate similarity (I, j)

The similarity calculation is different here. Only the user that has been evaluated by item_ I and item_j is used to calculate the similarity.

 

In this step, obtain the N x n item-similarity matrix.

 

Step 2: predict

?????

 

Variants 1. Slope one

Core Idea: replace weighted score with function f (x) = x + B, the free parameter B is then simply the average differencebetween the two items ratings.

How:

L find out the average score difference between (item_t, item_ I) Bi and the number of users who scored together Ni

L

2. Time weight

Core Idea: Traditional collaborative filtering algorithms do not consider changes in user interests. Changes in user interests will lead to poor recommendation quality.

How:

L design the time weight function f (t)

L use F (t) to give each score a time weight

L use the time-weighted score to calculate similarity. Others are the same.

3. popular items

Core idea: the more unpopular items have the same behavior, the more likely they are to describe the similarity between users, and reduce the impact of popular items when calculating user similarity.

How: Design popularity weight 1/N (I ).

4. Based on High score records

Core idea: the purpose of recommendation algorithms is to recommend the items most interested in users in most cases, and try to find the most similar users in the areas most interested in users.

How: In similarity calculation, only those records with high ratings are considered, for example, records with average scores higher than users' rating scores.

 

Data mining algorithm cultivation-collaborative filtering Collaborative Filtering

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.