He distance (Minkowski Distance)
When R=1, the Manhattan Distance (Manhatten)
When r=2, Euclidean distance (Euclidean)
r= Infinity, Upper definite boundary distance (supermum Distance)
Pearson correlation coefficient (Pearson CORRELATION coeffcient), value [ -1,1],1 is fully correlated,-1 means completely irrelevant
Approximate calculation formula
The cosine similarity calculation, the value [ -1,1],1 means exactly the same,-1 means completely dissimilar
Users = {"Angelica": {"Blues Traveler": 3.5,"Broken Bells": 2.0,"Norah Jones": 4.5,"Phoenix": 5.0,"slightly stoopid": 1.5,"The Strokes": 2.5,"Vampire Weekend": 2.0}, "Bill":{"Blues Traveler": 2.0,"Broken Bells": 3.5,"Deadmau5": 4.0,"Phoenix": 2.0,"slightly stoopid": 3.5,"Vampire Weekend": 3.0}, "Chan": {"Blues Traveler": 5.0,"Broken Bells": 1.0,"Deadmau5": 1.0,"Norah Jones": 3.0,"Phoenix": 5,"slightly stoopid": 1.0}, "Dan": {"Blues Traveler": 3.0,"Broken Bells": 4.0,"Deadmau5": 4.5,"Phoenix": 3.0,"slightly stoopid": 4.5,"The Strokes": 4.0,"Vampire Weekend": 2.0}, "Hailey": {"Broken Bells": 4.0,"Deadmau5": 1.0,"Norah Jones": 4.0,"The Strokes": 4.0,"Vampire Weekend": 1.0}, "Jordyn": {"Broken Bells": 4.5,"Deadmau5": 4.0,"Norah Jones": 5.0,"Phoenix": 5.0,"slightly stoopid": 4.5,"The Strokes": 4.0,"Vampire Weekend": 4.0}, "Sam": {"Blues Traveler": 5.0,"Broken Bells": 2.0,"Norah Jones": 3.0,"Phoenix": 5.0,"slightly stoopid": 4.0,"The Strokes": 5.0}, "Veronica": {"Blues Traveler": 3.0,"Norah Jones": 5.0,"Phoenix": 4.0,"slightly stoopid": 2.5,"The Strokes": 3.0}}#{User: {works: Score}}defManhattan (Rating1, rating2): #计算曼哈顿距离"""computes the Manhattan distance. Both rating1 and Rating2 are dictionaries of the form {' The Strokes ': 3.0, ' slightly stoopid ': 2.5}"""Distance=0 commonratings=False forKeyinchrating1:ifKeyinchrating2:distance+ = ABS (Rating1[key)-Rating2[key]) commonratings=Trueifcommonratings:returnDistanceElse: return-1defPearson (Rating1, rating2): #计算Pearson相关系数 sum_xy=0 sum_x=0 sum_y=0 sum_x2=0 Sum_y2=0 N=0 forKeyinchrating1:ifKeyinchrating2:n+ = 1x=Rating1[key] y=Rating2[key] Sum_xy+ = x *y sum_x+=x sum_y+=y sum_x2+ = POW (x, 2) Sum_y2+ = Pow (y, 2) #Now compute Denominatordenominator = sqrt (Sum_x2-pow (sum_x, 2)/N) * sqrt (Sum_y2-pow (sum_y, 2)/N)ifDenominator = =0:return0Else: return(Sum_xy-(sum_x * sum_y)/n)/denominator
Selection of Similarity:
The Pearson correlation coefficient is used when different users have different evaluation criteria for various products.
When the data is dense and the value of the attribute is important, use Euclidean or Manhattan distances;
When the data is sparse, there are many 0 values, considering the cosine similarity.
From "A Programmer ' s Guide to Data Mining"
Collaborative filtering of user similarity metrics