[Recommendation System thesis notes] A summary of the evaluation methods of Personalized Recommendation Systems (concepts-Introduction)

Source: Internet
Author: User

Overview of the evaluated algorithms for the personal recommendation systems

As the name suggests, this Chinese paper describes the evaluation method of the recommendation system, that is, how to evaluate the advantages and disadvantages of a recommendation system.

  • Introduction

 

1. Establish a Personalized Recommendation SystemUserAndProductBased on the user's existing selection process or similarity relationship, this module mines the objects that each user may be interested in for personalized recommendation. The essence of this feature isInformation Filtering.
2. A complete recommendation system consists of three parts:
The behavior record module for collecting user information;
Analyzes user-preferred model analysis modules;
    Recommendation Algorithm Module (CORE ):
I. Collaborative Filtering recommendation algorithms;
Ii. Content-based recommendation algorithms;
Iii. Recommendation algorithms based on the user-product diagram relationship;
Iv. Hybrid recommendation algorithm;
3. It is difficult to evaluate recommendation algorithms:
1) different algorithms have different performance on different datasets;
2) The evaluation objectives are also different;
3) are online user tests required for different data?
4) it is also very difficult to select which indicators for comprehensive evaluation. These four factors directly determine the objectivity and rationality of the evaluation.

 

 

 

  • Accuracy Evaluation Index

1. Prediction Accuracy

The prediction accuracy is based on the similarity between the prediction score of the recommendation algorithm and the user's actual score.

Mean Absolute Error ):

Other metrics related to Mae include Mean Squared Error (MSE) and normalized mean absolute error (nmae ).

Mean square error MSE:

Eg. there is a movie rating system that provides the average number of stars that other users comment on a movie (just like a Douban Book Review ), in addition, the prediction "number of stars" is provided for a user ". The prediction accuracy is the ability to measure the difference between the number of stars in the system and the number of stars actually given by the user. The score range is [0, 10].

Prediction of user
  Movie 1 Movie 2 Movie 3 Movie 4
System Prediction score 10 4 7 9
User score 8 5 7 6

 

Mean Absolute Error MAE:

Mean square error MSE:

Standard mean absolute error (for Standardization ):

 

  Prediction Accuracy
Advantages Mean Absolute Error MAE:
1. The calculation method is simple and easy to understand;
2. The mean absolute error of each system is unique, so that the differences between the mean absolute errors of the two systems can be distinguished.
Disadvantages It is not suitable for Binary selection information, such as like or dislike
It is not suitable for systems that only care about the prediction error at the front end of the recommendation list, but are not very concerned about the overall error of the system.
It is not applicable when the degree of user deviation is relatively small, because users only care about Classifying Good Product errors as bad products, or classify bad product errors as the proportion of good products. For example,
To distinguish between good and bad by 3.5 stars, making 4 predictions 5 or 3 predictions 2 has no impact on users.

 

2.CategoryAccuracy

Classification accuracy is defined as a recommendation algorithm for a product user.WhetherI like to determine the correct proportion. Therefore, when a userOnly binary SelectionThe classification accuracy is more appropriate.

Accuracy and recall rate:

Accuracy is defined as the ratio of your favorite products to all recommended products in the system's recommendation list. Accuracy indicates the possibility that you are interested in a recommended product.

The recall rate is defined as the ratio of your favorite products in the recommendation list to all your favorite products in the system. The recall rate indicates the probability that your favorite products will be recommended.

 

P = 4/9 r = 4/11

The biggest problem with the use of accuracy and recall rates for recommendation systems is that they must be used together to fully evaluate the quality of algorithms.

To measure the accuracy and recall rate at the same time, pazzan im puts forwardF indicator.F indicatorDefined

Another important indicator for measuring system classification accuracy is the subject operating characteristic curve (ROC curve ).

For more information, see Baidu encyclopedia. Http://baike.baidu.com/view/42249.htm? Fr = Aladdin #2

3. Sorting Accuracy

The sorting accuracy is used to measure the degree to which the List generated by the recommendation algorithm conforms to the user's order of the product.

Use average ranking score to measure the ranking accuracy of the Recommendation System. The specific definitions are as follows:

N indicates the number of products not selected by users in the training set, and Li indicates the position of the products to be predicted in the prediction set in the recommendation list.

4. Prediction and scoring Association

The association between the prediction scoring Association Analysis System and the user's actual scoring order is often used to depict the accuracy of the recommendation system.Different from the prediction accuracy, the prediction and scoring association does not consider the deviation between the prediction and scoring items, but the overall correlation between the two.

In the recommendation system, three commonly used correlation descriptions are Pearson correlation, Pearson correlation, and Kenda ll's Tau.

  Prediction and scoring Association
Advantages You can compare the ranking of a multi-channel scoring system. The calculation is simple and only one value is returned for all systems.
Disadvantages

The disadvantage of Kendall's Tau is to assign an equal weight to each equi-distance exchange. Therefore, the difference between 1st and 2nd in the recommendation list is the same as that between 1 000 and 1 001. In fact, users may only care about the top 10 products, but will never check products ranked in the top. Therefore, the difference between ranking 1 and ranking 2 has a greater impact on users. SPEA Rm an does not solve the problem of "weak sorting. The so-called weak sorting means that at least two products have the same score. On the contrary, each product has a different ranking, which is called full sorting. Because the system will place products with the same score in different positions, the feedback value of spearm an for different sorting is different. However, this is not reasonable because the user does not care about how the products with the same score are sorted. Kendall's Tau has similar problems.

 

5. Distance standardization indicator-ndpm

In the recommendation system, the core idea of ndpm is: ComparisonCorrelation between system prediction and scoring rankings and users' preferencesSystem,Standardize preference-based metricsIs defined as follows:

Among them, C-indicates the number of conflicting orders of the system and users. For example, the system considers that the number of users prefers 1 to exceed 2, while the number of Cus is compatible; CI is the total number of products with preferences in user sorting. Nd pm and SPEA Rm an coefficient are similar to Kendall's Tau, but ndpm results are more accurate. Balabanovc m and shoham y use the nd PM indicators to evaluate the accuracy of the FAB system and achieve very good results.

6. Half-life utility indicators

The recommendation system presents a list of ordered products for users, but most users do not want to go deep into this list. In the Internet web page recommendation system, the designer claims that the vast majority of inte RNET users will not go deep into the results returned by the search engine, and users are willing to browse the recommendation list function in exponential decay, here, the attenuation intensity is described as a semi-fading parameter.

The system has a half-life of all users.
The average value of the half-life is obtained. To obtain a high half-life utility value, the system must assign a high score to products with high user scores. The disadvantage is that if the actual utility function is not exponential decay, the system's half-life utility differs greatly from the user's actual feeling. For example, if you often search for the first 20 products in the recommendation list, the utility function should only assign values to the first 20 products, and then set them to 0.

 

Half-life utility index
Disadvantages

1) The weak sorting in the system makes the results different even if the same system is sorted;

2) because of the Max function, all products with scores smaller than the default value play the same role.

  • Evaluation indicators other than accuracy

1. Popularity and diversity of the recommendation list

The diversity of Recommendation lists in the recommendation system is measured by the average Hamming distance.

2. Coverage

Coverage is defined as the proportion of products with predictable scores to all products. In the recommendation system, coverage is particularly important, because only with a high coverage rate can we find as many products as possible that users are interested in. The simplest calculation method for coverage is to randomly select a number of user-product pairs and make a prediction for each user-product pair to measure the ratio of predictable products to all products. Just as the accuracy and recall rates must be used at the same time, the coverage rate must be used in conjunction with the accuracy rate, because the recommendation system cannot only provide a poor accuracy to improve the coverage rate.

3. freshness and extensiveness

Some recommendation systems have very high accuracy and relatively reasonable coverage, but these systems may not help users. For example, if a shopping recommendation system recommends milk to a user who does not buy milk, statistics may be very accurate: Everyone may buy milk. However, people are familiar with milk, and even if the system does not recommend it, users will know whether to buy it. Therefore, the best solution is to recommend users to products they have never purchased but are interested in. The same is true for music or movie recommendation systems. Recommendation of popular products will undoubtedly improve the accuracy of the system, but users will not get any new information from the system.

4. User Satisfaction

User satisfaction with the recommendation system depends not only on the accuracy of the system, but also on the extent to which the system can help users complete tasks. Therefore, to measure the user's evaluation of a recommendation system, the system must first have a clear definition of its own tasks, select appropriate indicators for specific tasks to evaluate the recommendation algorithm.

  • Summary

You can continue in-depth research from the following aspects:

1) User sensitivity to algorithm accuracy.

2) The universality of algorithms in different fields.

Different recommendation algorithms have different performance on different datasets. For a recommendation algorithm, the best data type can be used.

3) quality evaluation in a broad sense.

Most of the evaluation indicators only focus on accuracy, ignoring coverage, the ability of fresh products discovered by fresh systems, and user satisfaction. Because the user always evaluates the actual system from multiple aspects at the same time, the performance of algorithms with high accuracy is not necessarily good in practical applications. Whether these indicators can be combined to propose a comprehensive evaluation indicator, so that the system designer can imitate the user to directly evaluate the system.

4) privacy protection.

The essence of a recommendation system is to use existing user selection information or configuration files to discover users' interests and hobbies. If you want help from the recommendation system, you must share your private data. For the system, it is not only necessary to effectively protect the user's personal privacy, but also to make accurate and reasonable recommendations when the user's privacy data is used as little as possible. In turn, users are willing to use the recommendation system only when they confirm that the system can effectively protect their privacy data. Therefore, the future accuracy indicators should be used in conjunction with the level of personal privacy data protection.

5) Research on the robustness of the recommendation system.

After the recommendation system is put into application, some malicious users want to use their choice information to disrupt the normal user-product dual relationship in the system. In order to reduce the accuracy of the system, change the recommendation list provided by the system to normal users, so as to damage the system itself or increase the degree of recommendation of some products. With the increasing use of Recommendation Systems, the study of System Robustness is becoming increasingly important. Only systems that can withstand the test of such malicious attacks have lasting vitality.

[Recommendation System thesis notes] A summary of the evaluation methods of Personalized Recommendation Systems (concepts-Introduction)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.