Referral System 2nd week

Source: Internet
Author: User

Recommended system Categories

Based on application domain classification: E-commerce recommendation, social friend referral, search engine recommendation, information content recommendation
Based on design ideas: recommendations based on collaborative filtering , content-based recommendations, knowledge -based recommendations, mixed recommendations
Based on what data is used: recommendations based on user behavior data, recommendations based on user tags , based on social network data , contextual information (time context, location context, etc.)

The basic idea of collaborative filtering

Collaborative filtering is generally a large number of users to explore a small part of your taste is similar,

In collaborative filtering, these users become neighbors and then organize them into a sort of login as recommended to you based on what they like.
Core issues:
How to determine a user is not and you have similar taste?
How do you organize your neighbors ' preferences into a sort of login?

Steps to implement collaborative filtering

1. Collection of User Preferences
2. Find similar user abstinence items
3. Calculation recommendations

Ways to collect User preferences

Similarity degree

When users ' behavior has been import, we can calculate similar users and items according to user's preferences, and then recommend them based on the import of similar users ' products.

This is the two branches of the most typical CF: the user-based CF and the item-based CF. Both of these methods need to calculate similarity
Visualize data as vectors in space (noise reduction, normalization )

Calculation of distance

Euclidean distance
Other distances

Calculate similarity based on distance

Calculation of similarity based on correlation coefficients

Pearson correlation coefficient

Calculation of similarity based on angle cosine

Calculation of similarity based on Tanimoto coefficients

The delineation of a neighbor (user, item)

fixed number of neighbors: k-neighborhoods
Neighbor based on similarity threshold : threshold-based Neighborhoods

Recommendation algorithm: user-based collaborative filtering algorithm USERCF

Based on the user's collaborative filtering, the similarity between users ' remainders is measured by the user's evaluation of the item, and the recommendation is based on the similarity between the user remainders.
To put it simply: to recommend to users what other users like about him who are interested in similar things

The basic idea based on USERCF is quite simple, based on the user's preference for items to find neighboring neighbor users, and then the neighbor users like the recommendation to the current user.
In the calculation, it is a user's preference for all items as a vector to calculate the similarity between user remainders, after finding K-neighbor,

Based on the neighbors ' similarity weights and their preference for items, predict that the current user does not have a preference for items that are not involved, and calculate a list of sorted items as a recommendation.
Given an example, for user A, based on the user's historical preferences, here only calculate a neighbor-user C, and then the user C-like item D is recommended to user A.

An object-based collaborative filtering algorithm ITEMCF

Collaborative filtering based on item, which evaluates the similarity between item remainders by the user's rating of the item, and makes recommendations based on the similarity between item remainders.
To put it simply: give the user something similar to what he liked before he remainders the item.

The principle based on ITEMCF is similar to that based on USERCF, except that the object itself is used when computing neighbors, not from the user's point of view,

That is, based on the user's preference for items to find similar items, and then according to the user's historical preferences, recommend similar items to him.
From a computational point of view, it is the preference of all users for an item as a vector to calculate the similarity between items remainders, and to obtain similar items,

Based on the user's history preferences, the current user has not yet expressed a preference for items, calculated to get a sorted list of items as a recommendation.
Given an example, for item A, according to the historical preferences of all users, like item a users like item C, the article A and item C are more similar,

While User C likes item A, it can be inferred that User C may also like item C.

User CF vs. Item CF

For e-commerce, the number of users is generally much more than the number of goods, at this time the calculation of the item CF is less complex
In non-social network sites, the internal link of content is an important recommendation principle, which is more effective than the recommendation principle based on similar users.

For example, on the purchase of a book site, when you read a book, the recommendation engine will recommend to you the relevant books, the importance of this recommendation into more than the homepage of the user's comprehensive recommendation.

As you can see, in this case, the Item CF recommendation becomes an important means of navigating the user.

The collaborative filtering algorithm based on articles is the most widely used recommendation algorithm in e-commerce at present .
1. In social networking sites, user CF is a much more error--choice, and users CF, together with social network information, can increase the level of confidence in the referral interpretation.
2. Diversity and precision are recommended
3. User's adaptability to the recommended algorithm

Implementation of object-based collaborative filtering algorithm

Divided into 2 steps
1. Calculate the similarity between items remainders
2. generate a referral list for the user based on the similarity of the item and the user's historical behavior

Study examples

Internet a film review website, the main products include film introduction, film ranking, users on the film rating, netizens review, video & purchase tickets, the user in the look | Want to see the movie, Guess you like (recommended).
After the user completes the registration, can browse the website various movie introduction, watch the movie leaderboard, chooses own likes the classification,

Find the movie you want to see and set it to "want to see" while writing a review and scoring for a movie you've already seen.

Requirements Analysis: A case introduction

With a brief description, we can roughly see that this site offers personalized recommended movie services:
Core points:
– The site provides all the movie information that attracts users to browse
– The site collects user behavior, including browsing behavior, scoring behavior, and commenting behavior, thus inferring the user's hobby.
– The site helps users find a list of movies that the user has not yet seen and meets his interests.
– The site is accumulating through massive amounts of data, predicting the market impact and box office of future new films
The film recommendation will be the core feature of this site.

Factors to consider

When designing recommendations in a real environment, consider metrics such as data volume, algorithmic performance, and accuracy of results.
1. Recommended algorithm selection: Object-based collaborative filtering algorithm ITEMCF, parallel implementation
2. Data volume: Whether it is necessary to support GB,TB,PB level data based on big data architecture
3. Algorithm test: Can be judged by accuracy rate, recall rate, coverage, popularity and other indicators.
4. Interpretation of the results: through the definition of ITEMCF, reasonable give the result explanation

Test Data Set

Mahout in Action Book, the first chapter sixth based on the object of collaborative filtering algorithm import line implementation.
Test Data set: Small.csv
3 fields per line, the user ID, the movie ID, the user's rating for the movie (0-5 points, each 0.5 is divided into a score point!) )

Steps

1. Create a matrix of objects
2. Create a user-to-item scoring matrix
3. Matrix Calculation Recommendation Results

Step 1: Create the matrix of the item's co-occurrence

Group by user, find each user selected items, a separate count and 221 groups count.

Step 2: Establish a user-to-item scoring matrix

Group by user to find each user's selected items and ratings

Step 3: Matrix calculation recommendation Results

Co-existing Matrix * Scoring matrix = Recommended results

Algorithm evaluation

Mahout provides 2 evaluation recommender indicators, precision ratio and recall rate (recall), these two indicators are the search engine in the classic measurement method.
A: retrieved, related (found also wanted)
B: not retrieved, but relevant (not found, but actually wanted)
C: retrieved, but not relevant (search but useless)
D: Not retrieved, also irrelevant (no search is useless)

The more you retrieve, the better it is to pursue "recall", that is, a /(a+b), the bigger the better.

is retrieved, the more relevant the more the better, the less relevant the better, this is the pursuit of " precision", that is A/(A+C), the bigger the better.
In large-scale data sets, these two indicators are mutually restrictive . When you want to index more data, the precision will drop and you will index less data when you want the index to be more accurate.

Slope One algorithm

The lightweight CF recommendation Strategy provided by Mahout is a import approach to the recommendation engine for scoring-based collaborative filtering proposed by Daniel Lemire and Anna MacLachlan in 2005.
Slopeone is a simple and efficient collaborative filtering algorithm. The import line score is calculated by the average difference .
The core advantage of Slope one is that it still guarantees good computing speed and recommended results on large scale data.
This algorithm has been @Deprecatedin the mahout-0.8 version.

Algorithmic thinking

Slope One recommends the rationale that it considers the relationship between the user's scoring remainders as a simple linear relationship:Y = mx+ b; When m = 1 o'clock is Slope one.

Resources

Wikipedia's introduction to Slope one: Http://en.wikipedia.org/wiki/Slope_One
Original paper: http://www.daniel-lemire.com/fr/abstracts/SDM2005.html

Other recommended algorithms supported by Mahout

KNN Linear interpolation item–based recommendation algorithm
SVD recommendation algorithm
Tree cluster-based recommendation algorithm
The above algorithm has been @deprecated in the mahout-0.8 version.

Summary of recommended algorithms for Mahout support

Referral System 2nd week

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.