Collaborative filtering of user similarity metrics

Source: Internet
Author: User

He distance (Minkowski Distance)

When R=1, the Manhattan Distance (Manhatten)

When r=2, Euclidean distance (Euclidean)

r= Infinity, Upper definite boundary distance (supermum Distance)

Pearson correlation coefficient (Pearson CORRELATION coeffcient), value [ -1,1],1 is fully correlated,-1 means completely irrelevant

Approximate calculation formula

The cosine similarity calculation, the value [ -1,1],1 means exactly the same,-1 means completely dissimilar

Users = {"Angelica": {"Blues Traveler": 3.5,"Broken Bells": 2.0,"Norah Jones": 4.5,"Phoenix": 5.0,"slightly stoopid": 1.5,"The Strokes": 2.5,"Vampire Weekend": 2.0},         "Bill":{"Blues Traveler": 2.0,"Broken Bells": 3.5,"Deadmau5": 4.0,"Phoenix": 2.0,"slightly stoopid": 3.5,"Vampire Weekend": 3.0},         "Chan": {"Blues Traveler": 5.0,"Broken Bells": 1.0,"Deadmau5": 1.0,"Norah Jones": 3.0,"Phoenix": 5,"slightly stoopid": 1.0},         "Dan": {"Blues Traveler": 3.0,"Broken Bells": 4.0,"Deadmau5": 4.5,"Phoenix": 3.0,"slightly stoopid": 4.5,"The Strokes": 4.0,"Vampire Weekend": 2.0},         "Hailey": {"Broken Bells": 4.0,"Deadmau5": 1.0,"Norah Jones": 4.0,"The Strokes": 4.0,"Vampire Weekend": 1.0},         "Jordyn":  {"Broken Bells": 4.5,"Deadmau5": 4.0,"Norah Jones": 5.0,"Phoenix": 5.0,"slightly stoopid": 4.5,"The Strokes": 4.0,"Vampire Weekend": 4.0},         "Sam": {"Blues Traveler": 5.0,"Broken Bells": 2.0,"Norah Jones": 3.0,"Phoenix": 5.0,"slightly stoopid": 4.0,"The Strokes": 5.0},         "Veronica": {"Blues Traveler": 3.0,"Norah Jones": 5.0,"Phoenix": 4.0,"slightly stoopid": 2.5,"The Strokes": 3.0}}#{User: {works: Score}}defManhattan (Rating1, rating2): #计算曼哈顿距离"""computes the Manhattan distance. Both rating1 and Rating2 are dictionaries of the form {' The Strokes ': 3.0, ' slightly stoopid ': 2.5}"""Distance=0 commonratings=False forKeyinchrating1:ifKeyinchrating2:distance+ = ABS (Rating1[key)-Rating2[key]) commonratings=Trueifcommonratings:returnDistanceElse:        return-1defPearson (Rating1, rating2): #计算Pearson相关系数 sum_xy=0 sum_x=0 sum_y=0 sum_x2=0 Sum_y2=0 N=0 forKeyinchrating1:ifKeyinchrating2:n+ = 1x=Rating1[key] y=Rating2[key] Sum_xy+ = x *y sum_x+=x sum_y+=y sum_x2+ = POW (x, 2) Sum_y2+ = Pow (y, 2)    #Now compute Denominatordenominator = sqrt (Sum_x2-pow (sum_x, 2)/N) * sqrt (Sum_y2-pow (sum_y, 2)/N)ifDenominator = =0:return0Else:        return(Sum_xy-(sum_x * sum_y)/n)/denominator

Selection of Similarity:

The Pearson correlation coefficient is used when different users have different evaluation criteria for various products.

When the data is dense and the value of the attribute is important, use Euclidean or Manhattan distances;

When the data is sparse, there are many 0 values, considering the cosine similarity.

From "A Programmer ' s Guide to Data Mining"

Collaborative filtering of user similarity metrics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.