http://in.sdo.com/?p=2779 recommended algorithm accuracy Measurement formula:
where R (U) represents n items recommended to the user, T (u) represents the collection of items that user U likes on the test set.
Set similarity measure formula (distance metric formula for n-dimensional vectors):
Jaccard formula:
where N (u) indicates that the user U has positive feedback on the collection of items.
Cosine similarity formula:
USERCF formula:
where S (U,k) represents the K-user collection that is closest to the user U, N (i) represents a user set with positive feedback on item I, and W (U,V) represents the interest similarity between user U and user V, and R (V,i) represents the user V interest in item I. The default is R (v,i) =1 or 0.
Perspective: The user is a feature. Full personalization, each user is unique, each user represents one-dimensional characteristics. The number of users is the dimension of the feature space, the user characteristics are represented as tu (u) ={0,0,0, ..., 1, ...,},t (u, u) = 1, the remainder is 0. For the characteristics of the object Ti (j) ={0, 1, 1, 1, 0, ...}, the user dimension of the purchase item I is characterized by 1. Further, the feature space of the item can be expressed as the number of times the user buys. The similarity W (u,v) is regarded as the weighted value of the corresponding dimension feature.
ITEMCF formula:
where S (I,k) represents the most similar to the item I k items set; N (U) represents a collection of objects that the user likes, W (I,j) represents the similarity between item I and item J, and R (U,i) represents the user U's interest in I goods. The default is R (u,i) =1 or 0.
Perspective: An item is a feature. Each item is unique, and each item represents one-dimensional characteristics. The number of items is the dimension of the characteristic space, the item characteristic is represented as tu (u) ={0,0,0, ..., 1, ...,},t (u, u) = 1, the remainder is 0. For the user's characteristic space Ti (j) ={0, 1, 1, 1, 0, ...}, the dimension feature of the purchase item I is 1. Further, the feature space of the item can be expressed as the number of times the user buys. The similarity W (I,J) is regarded as the weighted value of the corresponding dimension feature.
LFM formula:
where F denotes the number of hidden classes, p (u,k) indicates the user's interest in the K-class, and Q (i,k) denotes the similarity of the K-class and article I. Alpha indicates the learning rate; Lamda represents a regularization parameter.
Perspective: LFM's formula is a generalized representation of typical eigenvector spaces and feature weighted matrices.
TAGCF formula:
N (u,b) indicates the number of times the user U has hit tag B, and N (b,j) indicates the number of times the item I was tagged.
Perspective: Labels are features.
Relationship Chain Recommendation formula:
where F (u) represents the user U's friend collection, W (u,v) can represent user U and V familiarity (common friends), also can represent user U and v interest similarity (USERCF definition), but also a comprehensive measure of the two, R (V,i) represents the user V interest in item I. The default is R (v,i) =1 or 0.
Perspective: A friend is a feature, or a friend's interest is a feature.
Information Flow recommendation formula:
The side e of the information flow I indicates the behavior of other users to the information flow I, E (e) represents and the information flow I side of the set, V (e) Represents the user V and the current user U similarity (familiarity); W (e) Represents the weight of the edge type; D (e) represents the time decay parameter of edge E.
Summary of recommended algorithms
All the proposed algorithms can be regarded as the recommended algorithms based on eigenvector space and feature weighted matrix.
When the dimension of eigenvector is large, the computational complexity of the proposed algorithm based on eigenvector space and feature weighted matrix will be very large. A common practice is to use dimensionality reduction techniques, such as using Minhash (Simhash). Another approach is to first cluster the N-dimensional feature space into M-dimensional feature space (m<n). For example, in the ITEMCF algorithm, the objects can be clustered, and then use the M class of items as a feature. Of course, the weight matrix needs to be computed after dimensionality reduction.
Another extreme approach: Discard weights matrix, on the basis of classification, directly according to category recommendation. That is, based on the feature vector matching.
Algorithm framework based on feature matching (user or item available):
1) Feature Selection
A) known user characteristics: directly according to the characteristics of the classification
b) Unknown user characteristics: clustering such as LFM
2) The characteristics of the item are calculated using the purchase user characteristics. For example, a simple selection of TOPN user characteristics as the characteristics of the article;
3) According to the user characteristics and product characteristics of the recommendation, if the feature space is limited (the category is not many), you can use the category recommendation; If the feature space is large, it can be recommended by calculating the feature distance.
3 recommended systems for associated users and items
Feature-based recommendation algorithm
Description: When the user likes multiple features, the item has multiple characteristics, that is, the typical feature vector space and feature weighted matrix recommendation algorithm; When the user only likes 1 features, the item only has a single feature, that is, the recommendation algorithm based on feature classification.
In practical application, we often adopt multiple recommendation algorithms, implement different recommendation engines, and finally do fusion according to the results of different recommendation engines, that is, the fusion of algorithms, the common is weighted fusion.
Recommended system Architecture
The actual recommendation system usually uses a variety of recommendation algorithms, and according to the user's real-time behavior feedback to adjust the user's eigenvector (feature weighting coefficient), and then fuse the recommendations of each recommendation algorithm, on the basis of filtering not recommended, and finally combined with user scenarios to adjust the recommended results ranking, give the final recommendation results.
Recommendation algorithms based on different features often use periodic calculations and periodic updates of feature item recommendations, such as item-based similarity features, to save each item's most relevant K item, based on the user, which retains the nearest n items per user, based on the tag feature, Save the maximum number of m item per tag, save the most popular n item for each age group based on the user's age profile, and save the n item that each user likes most recently, based on user like, or a favorite m category ...
The user's real-time behavior feedback and the user's current scene will affect the final recommendation result in real time, the user's real-time feedback can directly affect the recommendation result fusion, and the user's scene will determine the ranking and presentation of the recommended results. At the same time the user's feedback will also affect the offline calculation of the item recommendation data.
Feature-based recommendation algorithm "Turn"