A detailed Python recommendation algorithm for collaborative filtering

Last Update:2018-05-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent, this article mainly introduces the Python implementation of collaborative filtering recommendation algorithm complete code example, with a certain reference value, the need for friends can refer to. Hope to help everyone.

Test data

http://grouplens.org/datasets/movielens/

Collaborative filtering recommendation algorithms are mainly divided into:

1, based on the user. According to the neighboring users, predict the current user does not have a preference for items not involved, calculated to get a sorted list of items to recommend

2, based on goods. If the user likes item a like item C, then you can know that item A and item C similarity is very high, and User C likes item A, then you can infer that user C may also like item C.

Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent:

1. Collection of User Preferences

1) Grouping of different behaviors

2) Weighted calculation of different groups the total preferences of the user

3) de-noising and normalization of data

2. Find similar users (based on user) or items (based on items)

3, calculate the similarity and sort. Recommended for users based on similarity

This example process:

1. Initialize data

Get movies and ratings

Convert to Data userdict represents a collection of ratings for all movies for a user, and normalization of the score divided by 5

Convert to data itemuser all user collections that represent a movie's participation in scoring

2. Calculate the similarity between all users and UserID

Find all users who see a movie with a userid intersection

The similarity between the user's cyclic calculation and the UserID

Gets the set of a user and userid. Format: {' movie id ', [a user's rating, UserID's rating]}, no rating 0

Calculates the cosine distance between a user and userid, the larger the more similar

3, according to the similarity of the production of recommended movie list

4. Output recommendation list and accuracy rate

#!/usr/bin/python3#-*-coding:utf-8-*-from numpy import *import timefrom texttable import texttableclass cf:def __ini     T__ (self, movies, ratings, k=5, n=10): self.movies = Movies self.ratings = ratings # number of neighbors SELF.K = k # recommended SELF.N = n # user Rating of the movie # data format {' UserID: User ID ': [(MovieID: Movie id,rating: User's rating of the movie]} self.userdict = {} # for a movie score    # data format: {' MovieID: Movie id ', [UserID: User ID]} # {' 1 ', [[...],...} Self. Itemuser = {} # neighbor's information self.neighbors = [] # recommended List self.recommandlist = [] self.cost = 0.0 # based on the user's recommendation # according to the Power The score of the shadow calculates the similarity between the users Def recommendbyuser (self, userId): Self.formatrate () # The recommended number equals itself the number of rated movies, the user calculates the accuracy rate SELF.N = Len . Userdict[userid]) Self.getnearestneighbor (userid) self.getrecommandlist (userid) self.getprecision (userid) # get push Recommended list def getrecommandlist (self, userId): Self.recommandlist = [] # Create a referral Dictionary recommanddict = {} for neighbor in S Elf.neighbors:movies = self.userdict[neighbor[1]] for movie In Movies:if (Movie[0] in recommanddict): recommanddict[movie[0]] + = neighbor[0] Else:re Commanddict[movie[0]] = neighbor[0] # Set up a recommendation list for key in RecommandDict:self.recommandList.append ([recommanddict [key], key]) Self.recommandList.sort (reverse=true) self.recommandlist = SELF.RECOMMANDLIST[:SELF.N] # convert ratings to us Erdict and Itemuser def formatrate (self): Self.userdict = {} self. Itemuser = {} for I in self.ratings: # score up to 5 divided by 5 for data Normalization temp = (i[1], float (i[2])/5) # calculate userdict {' 1 ': [(1,5), (2,5) ...], ' 2 ': [...]      ...} if (I[0] in self.userdict): Self.userdict[i[0]].append (temp) else:self.userdict[i[0]] = [temp] # Meter      Count Itemuser {' 1 ', [A.],...} if (i[1] in self. Itemuser): Self. Itemuser[i[1]].append (I[0]) else:self.  ITEMUSER[I[1]] = [I[0] # Find a user's neighbor Def getnearestneighbor (self, userId): Neighbors = [] Self.neighbors = [] # Get the UserID rating of the movies have those users also comment too much for i IN Self.userdict[userid]: for J in self. Itemuser[i[0]: if (j! = UserId and J not in neighbors): Neighbors.append (j) # Calculates the similarity of these users to the UserID and sorts F     Or I in neighbors:dist = Self.getcost (userId, i) self.neighbors.append ([Dist, I]) # Sort By default is ascending, reverse=true means descending Self.neighbors.sort (reverse=true) self.neighbors = self.neighbors[:self.k] # format userdict data def formatuserdict (sel      F, UserId, L): User = {} for I in Self.userdict[userid]: user[i[0] "= [i[1], 0] for J in Self.userdict[l]: if (J[0] not in user): User[j[0] [[0, J[1]] else:user[j[0]][1] = j[1] Return user # Calculate cosine distance D EF Getcost (Self, userId, L): # gets user userId and L rated movie's set # {' Movie id ': [UserId's rating, L's rating]} has no rating of 0 user = Self.formatuserdict ( UserId, l) x = 0.0 y = 0.0 z = 0.0 for k, V in User.items (): x + = float (v[0]) * FLOAT (v[0]) y + = Flo At (v[1]) * FLOAT (v[1]) z + = float (v[0]) * FLOAT (v[1]) if (z = = 0.0): return 0 return Z/sqrt (x * y) # recommended accuracy def getprecision (self, userId): User = [i[0] for I in Self.userdict[userid]] Recommand = [i [1] for I in self.recommandlist] count = 0.0 if (len (user) >= len (Recommand)): For I in Recommand:if ( I in user): Count + = 1.0 Self.cost = Count/len (recommand) else:for i in User:if (i in Reco Mmand): Count + = 1.0 Self.cost = Count/len (user) # Show recommended list def showtable (self): neighbors_id = [i[1] F  Or I in self.neighbors] table = texttable () Table.set_deco (Texttable.header) Table.set_cols_dtype (["T", "T", "T",  "T"]) table.set_cols_align (["L", "L", "L", "L"]) rows = [] Rows.append ([u "movie ID", U "Name", U "release", U "from          UserID "]) for item in Self.recommandList:fromID = [] for i in self.movies:if i[0] = = Item[1]: Movie = I break for i in self. ITEMUSER[ITEM[1]]: If I in Neighbors_id:fromID.append (i) movie.append (fromid) rows.append (Movie) table.add_rows (rows) print (Table.draw ()) # Get Data def readFile (filename): Files = open (Filena Me, "R", encoding= "Utf-8") # If the read is unsuccessful try # files = open (filename, "R", encoding= "iso-8859-15") data = [] for line in fi Les.readlines (): item = Line.strip (). Split ("::") data.append (item) return data#-------------------------start-------- -----------------------start = Time.clock () Movies = ReadFile ("/home/hadoop/python/cf/movies.dat") ratings = ReadFile ( "/home/hadoop/python/cf/ratings.dat") demo = CF (Movies, ratings, k=20) demo.recommendbyuser ("+") print ("Recommended list:") Demo.showtable () print ("Processed data is%d"% (len (demo.ratings))) print ("Accuracy:%.2f%"% (Demo.cost *)) end = Time.clock () print ("Time-consuming:%f S"% (End-start))

Summarize

The above is the whole content of the complete code example of Python implementation collaborative filtering recommendation algorithm, we hope to help you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A detailed Python recommendation algorithm for collaborative filtering

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support