Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent, this article mainly introduces the Python implementation of collaborative filtering recommendation algorithm complete code example, with a certain reference value, the need for friends can refer to. Hope to help everyone.
Test data
http://grouplens.org/datasets/movielens/
Collaborative filtering recommendation algorithms are mainly divided into:
1, based on the user. According to the neighboring users, predict the current user does not have a preference for items not involved, calculated to get a sorted list of items to recommend
2, based on goods. If the user likes item a like item C, then you can know that item A and item C similarity is very high, and User C likes item A, then you can infer that user C may also like item C.
Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent:
1. Collection of User Preferences
1) Grouping of different behaviors
2) Weighted calculation of different groups the total preferences of the user
3) de-noising and normalization of data
2. Find similar users (based on user) or items (based on items)
3, calculate the similarity and sort. Recommended for users based on similarity
This example process:
1. Initialize data
Get movies and ratings
Convert to Data userdict represents a collection of ratings for all movies for a user, and normalization of the score divided by 5
Convert to data itemuser all user collections that represent a movie's participation in scoring
2. Calculate the similarity between all users and UserID
Find all users who see a movie with a userid intersection
The similarity between the user's cyclic calculation and the UserID
Gets the set of a user and userid. Format: {' movie id ', [a user's rating, UserID's rating]}, no rating 0
Calculates the cosine distance between a user and userid, the larger the more similar
3, according to the similarity of the production of recommended movie list
4. Output recommendation list and accuracy rate
#!/usr/bin/python3#-*-coding:utf-8-*-from numpy import *import timefrom texttable import texttableclass cf:def __ini T__ (self, movies, ratings, k=5, n=10): self.movies = Movies self.ratings = ratings # number of neighbors SELF.K = k # recommended SELF.N = n # user Rating of the movie # data format {' UserID: User ID ': [(MovieID: Movie id,rating: User's rating of the movie]} self.userdict = {} # for a movie score # data format: {' MovieID: Movie id ', [UserID: User ID]} # {' 1 ', [[...],...} Self. Itemuser = {} # neighbor's information self.neighbors = [] # recommended List self.recommandlist = [] self.cost = 0.0 # based on the user's recommendation # according to the Power The score of the shadow calculates the similarity between the users Def recommendbyuser (self, userId): Self.formatrate () # The recommended number equals itself the number of rated movies, the user calculates the accuracy rate SELF.N = Len . Userdict[userid]) Self.getnearestneighbor (userid) self.getrecommandlist (userid) self.getprecision (userid) # get push Recommended list def getrecommandlist (self, userId): Self.recommandlist = [] # Create a referral Dictionary recommanddict = {} for neighbor in S Elf.neighbors:movies = self.userdict[neighbor[1]] for movie In Movies:if (Movie[0] in recommanddict): recommanddict[movie[0]] + = neighbor[0] Else:re Commanddict[movie[0]] = neighbor[0] # Set up a recommendation list for key in RecommandDict:self.recommandList.append ([recommanddict [key], key]) Self.recommandList.sort (reverse=true) self.recommandlist = SELF.RECOMMANDLIST[:SELF.N] # convert ratings to us Erdict and Itemuser def formatrate (self): Self.userdict = {} self. Itemuser = {} for I in self.ratings: # score up to 5 divided by 5 for data Normalization temp = (i[1], float (i[2])/5) # calculate userdict {' 1 ': [(1,5), (2,5) ...], ' 2 ': [...] ...} if (I[0] in self.userdict): Self.userdict[i[0]].append (temp) else:self.userdict[i[0]] = [temp] # Meter Count Itemuser {' 1 ', [A.],...} if (i[1] in self. Itemuser): Self. Itemuser[i[1]].append (I[0]) else:self. ITEMUSER[I[1]] = [I[0] # Find a user's neighbor Def getnearestneighbor (self, userId): Neighbors = [] Self.neighbors = [] # Get the UserID rating of the movies have those users also comment too much for i IN Self.userdict[userid]: for J in self. Itemuser[i[0]: if (j! = UserId and J not in neighbors): Neighbors.append (j) # Calculates the similarity of these users to the UserID and sorts F Or I in neighbors:dist = Self.getcost (userId, i) self.neighbors.append ([Dist, I]) # Sort By default is ascending, reverse=true means descending Self.neighbors.sort (reverse=true) self.neighbors = self.neighbors[:self.k] # format userdict data def formatuserdict (sel F, UserId, L): User = {} for I in Self.userdict[userid]: user[i[0] "= [i[1], 0] for J in Self.userdict[l]: if (J[0] not in user): User[j[0] [[0, J[1]] else:user[j[0]][1] = j[1] Return user # Calculate cosine distance D EF Getcost (Self, userId, L): # gets user userId and L rated movie's set # {' Movie id ': [UserId's rating, L's rating]} has no rating of 0 user = Self.formatuserdict ( UserId, l) x = 0.0 y = 0.0 z = 0.0 for k, V in User.items (): x + = float (v[0]) * FLOAT (v[0]) y + = Flo At (v[1]) * FLOAT (v[1]) z + = float (v[0]) * FLOAT (v[1]) if (z = = 0.0): return 0 return Z/sqrt (x * y) # recommended accuracy def getprecision (self, userId): User = [i[0] for I in Self.userdict[userid]] Recommand = [i [1] for I in self.recommandlist] count = 0.0 if (len (user) >= len (Recommand)): For I in Recommand:if ( I in user): Count + = 1.0 Self.cost = Count/len (recommand) else:for i in User:if (i in Reco Mmand): Count + = 1.0 Self.cost = Count/len (user) # Show recommended list def showtable (self): neighbors_id = [i[1] F Or I in self.neighbors] table = texttable () Table.set_deco (Texttable.header) Table.set_cols_dtype (["T", "T", "T", "T"]) table.set_cols_align (["L", "L", "L", "L"]) rows = [] Rows.append ([u "movie ID", U "Name", U "release", U "from UserID "]) for item in Self.recommandList:fromID = [] for i in self.movies:if i[0] = = Item[1]: Movie = I break for i in self. ITEMUSER[ITEM[1]]: If I in Neighbors_id:fromID.append (i) movie.append (fromid) rows.append (Movie) table.add_rows (rows) print (Table.draw ()) # Get Data def readFile (filename): Files = open (Filena Me, "R", encoding= "Utf-8") # If the read is unsuccessful try # files = open (filename, "R", encoding= "iso-8859-15") data = [] for line in fi Les.readlines (): item = Line.strip (). Split ("::") data.append (item) return data#-------------------------start-------- -----------------------start = Time.clock () Movies = ReadFile ("/home/hadoop/python/cf/movies.dat") ratings = ReadFile ( "/home/hadoop/python/cf/ratings.dat") demo = CF (Movies, ratings, k=20) demo.recommendbyuser ("+") print ("Recommended list:") Demo.showtable () print ("Processed data is%d"% (len (demo.ratings))) print ("Accuracy:%.2f%"% (Demo.cost *)) end = Time.clock () print ("Time-consuming:%f S"% (End-start))
Summarize
The above is the whole content of the complete code example of Python implementation collaborative filtering recommendation algorithm, we hope to help you.