The Slopeone algorithm is a very simple collaborative filtering algorithm, the main idea is as follows: if the user u to the item J is too much, now to the item I rate, then only need to calculate the item I and J rate of this person, the difference between their scores average, Then we can be based on the difference of the score to calculate the user u to the item I, of course, such items J also have many, that some items and J co-scoring fewer people, some items and J common scores of people more, then it is obvious that the common score of the item in the rating of the proportion should be larger.
As above is the main idea of the simple Slopeone algorithm, which is represented by a graph on Wikipedia (one can understand it):
On the way User B to score the item J, then found that the item I was hit by User B for 2 points, but also found that user A at the same time evaluated the item I and item J, and item I is less than item J 0.5 points, then it seems that User B to the item J score should be compared to the item I hit the score of 0.5 points, it is 2.5 points.
Because the thought is so simple, so we come to practice a, of course, here is the most simple implementation, just to detect how the algorithm effect ... Data set is the same as the above blog, with a small data set inside the Movielens, which has 1000 users of 2000 items scored, 80% for training, 20% for testing.
The specific code is as follows:
#include <iostream> #include <string> #include <fstream> #include <math.h>using namespace std; const int USERMAX = 1000;const int itemmax = 2000;double rating[usermax][itemmax];int i[usermax][itemmax];//indicate if th E Item is rateddouble mean;double predict (int u, int. l) {Double total = 0;double totalcnt = 0;for (int i = 0; i < Itemma X i++) {if (l! = I&&i[u][i]) {Double dev = 0;int cnt = 0;for (int j = 0; J < Usermax; J + +) {if (I[j][l] && i [j] [i]) {dev + = rating[j][i]-rating[j][l];cnt++;}} if (CNT) {Dev/= cnt;total + = (rating[u][i]-dev) *cnt;totalcnt + = cnt;}} if (totalcnt = = 0) return Mean;return total/totalcnt;} Double Calmean () {Double total = 0;int cnt = 0;for (int i = 0, i < Usermax; i++) for (int j = 0; J < Itemmax; J + +) {tot Al + = i[i][j] * rating[i][j];cnt + = i[i][j];} return total/cnt;} void Train () {//read rating Matrixmemset (rating, 0, sizeof (rating)), memset (i, 0, sizeof (i)), Ifstream in ("Ua.base"); In) {cout << ' file not exist"<< endl;exit (1);} int userId, itemId, rate;string timestamp;while (in >> userId >> itemId >> rate >> timeStamp) {Rati Ng[userid][itemid] = Rate;i[userid][itemid] = 1;} mean = Calmean ();} void Test () {Ifstream in ("Ua.test"), if (!in) {cout << "file not exist" << endl;exit (1);} int userId, itemId, rate;string timestamp;double total = 0;double cnt = 0;while (in >> userId >> itemId >&G T Rate >> TimeStamp) {Double R = Predict (UserId, itemId); cout << "true:" << rate << "predict:" < ;< r << endl;total + = (r-rate) * (r-rate); CNT + 1;//cout << total << Endl;} cout << "Test Rmse is" << pow (total/cnt, 0.5) << Endl;} int main () {train (); test (); return 0;}
The experimental results are as follows:
In the test set on the Rmse reached 0.96, and before a blog implementation of the SVD through a complex gradient down to find the best solution also about 0.95, so slopeone algorithm is very simple and effective, Wikipedia is said to be the most concise collaborative filtering, but I personally feel similar to KNN's collaborative filtering more understood AH (only in the calculation with such as the user's similarity is a bit troublesome)
Slopeone Recommended algorithm Implementation (c + +)