A detailed Python recommendation algorithm for collaborative filtering

Source: Internet
Author: User
Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent, this article mainly introduces the Python implementation of collaborative filtering recommendation algorithm complete code example, with a certain reference value, the need for friends can refer to. Hope to help everyone.

Test data

http://grouplens.org/datasets/movielens/

Collaborative filtering recommendation algorithms are mainly divided into:

1, based on the user. According to the neighboring users, predict the current user does not have a preference for items not involved, calculated to get a sorted list of items to recommend

2, based on goods. If the user likes item a like item C, then you can know that item A and item C similarity is very high, and User C likes item A, then you can infer that user C may also like item C.

Different data, different programs ape-written collaborative filtering recommendation algorithm is different, but its core is consistent:

1. Collection of User Preferences

1) Grouping of different behaviors

2) Weighted calculation of different groups the total preferences of the user

3) de-noising and normalization of data

2. Find similar users (based on user) or items (based on items)

3, calculate the similarity and sort. Recommended for users based on similarity

This example process:

1. Initialize data

Get movies and ratings

Convert to Data userdict represents a collection of ratings for all movies for a user, and normalization of the score divided by 5

Convert to data itemuser all user collections that represent a movie's participation in scoring

2. Calculate the similarity between all users and UserID

Find all users who see a movie with a userid intersection

The similarity between the user's cyclic calculation and the UserID

Gets the set of a user and userid. Format: {' movie id ', [a user's rating, UserID's rating]}, no rating 0

Calculates the cosine distance between a user and userid, the larger the more similar

3, according to the similarity of the production of recommended movie list

4. Output recommendation list and accuracy rate

#!/usr/bin/python3#-*-coding:utf-8-*-from numpy import *import timefrom texttable import texttableclass cf:def __ini     T__ (self, movies, ratings, k=5, n=10): self.movies = Movies self.ratings = ratings # number of neighbors SELF.K = k # recommended SELF.N = n # user Rating of the movie # data format {' UserID: User ID ': [(MovieID: Movie id,rating: User's rating of the movie]} self.userdict = {} # for a movie score    # data format: {' MovieID: Movie id ', [UserID: User ID]} # {' 1 ', [[...],...} Self. Itemuser = {} # neighbor's information self.neighbors = [] # recommended List self.recommandlist = [] self.cost = 0.0 # based on the user's recommendation # according to the Power The score of the shadow calculates the similarity between the users Def recommendbyuser (self, userId): Self.formatrate () # The recommended number equals itself the number of rated movies, the user calculates the accuracy rate SELF.N = Len . Userdict[userid]) Self.getnearestneighbor (userid) self.getrecommandlist (userid) self.getprecision (userid) # get push Recommended list def getrecommandlist (self, userId): Self.recommandlist = [] # Create a referral Dictionary recommanddict = {} for neighbor in S Elf.neighbors:movies = self.userdict[neighbor[1]] for movie In Movies:if (Movie[0] in recommanddict): recommanddict[movie[0]] + = neighbor[0] Else:re Commanddict[movie[0]] = neighbor[0] # Set up a recommendation list for key in RecommandDict:self.recommandList.append ([recommanddict [key], key]) Self.recommandList.sort (reverse=true) self.recommandlist = SELF.RECOMMANDLIST[:SELF.N] # convert ratings to us Erdict and Itemuser def formatrate (self): Self.userdict = {} self. Itemuser = {} for I in self.ratings: # score up to 5 divided by 5 for data Normalization temp = (i[1], float (i[2])/5) # calculate userdict {' 1 ': [(1,5), (2,5) ...], ' 2 ': [...]      ...} if (I[0] in self.userdict): Self.userdict[i[0]].append (temp) else:self.userdict[i[0]] = [temp] # Meter      Count Itemuser {' 1 ', [A.],...} if (i[1] in self. Itemuser): Self. Itemuser[i[1]].append (I[0]) else:self.  ITEMUSER[I[1]] = [I[0] # Find a user's neighbor Def getnearestneighbor (self, userId): Neighbors = [] Self.neighbors = [] # Get the UserID rating of the movies have those users also comment too much for i IN Self.userdict[userid]: for J in self. Itemuser[i[0]: if (j! = UserId and J not in neighbors): Neighbors.append (j) # Calculates the similarity of these users to the UserID and sorts F     Or I in neighbors:dist = Self.getcost (userId, i) self.neighbors.append ([Dist, I]) # Sort By default is ascending, reverse=true means descending Self.neighbors.sort (reverse=true) self.neighbors = self.neighbors[:self.k] # format userdict data def formatuserdict (sel      F, UserId, L): User = {} for I in Self.userdict[userid]: user[i[0] "= [i[1], 0] for J in Self.userdict[l]: if (J[0] not in user): User[j[0] [[0, J[1]] else:user[j[0]][1] = j[1] Return user # Calculate cosine distance D EF Getcost (Self, userId, L): # gets user userId and L rated movie's set # {' Movie id ': [UserId's rating, L's rating]} has no rating of 0 user = Self.formatuserdict ( UserId, l) x = 0.0 y = 0.0 z = 0.0 for k, V in User.items (): x + = float (v[0]) * FLOAT (v[0]) y + = Flo At (v[1]) * FLOAT (v[1]) z + = float (v[0]) * FLOAT (v[1]) if (z = = 0.0): return 0 return Z/sqrt (x * y) # recommended accuracy def getprecision (self, userId): User = [i[0] for I in Self.userdict[userid]] Recommand = [i [1] for I in self.recommandlist] count = 0.0 if (len (user) >= len (Recommand)): For I in Recommand:if ( I in user): Count + = 1.0 Self.cost = Count/len (recommand) else:for i in User:if (i in Reco Mmand): Count + = 1.0 Self.cost = Count/len (user) # Show recommended list def showtable (self): neighbors_id = [i[1] F  Or I in self.neighbors] table = texttable () Table.set_deco (Texttable.header) Table.set_cols_dtype (["T", "T", "T",  "T"]) table.set_cols_align (["L", "L", "L", "L"]) rows = [] Rows.append ([u "movie ID", U "Name", U "release", U "from          UserID "]) for item in Self.recommandList:fromID = [] for i in self.movies:if i[0] = = Item[1]: Movie = I break for i in self. ITEMUSER[ITEM[1]]: If I in Neighbors_id:fromID.append (i) movie.append (fromid) rows.append (Movie) table.add_rows (rows) print (Table.draw ()) # Get Data def readFile (filename): Files = open (Filena Me, "R", encoding= "Utf-8") # If the read is unsuccessful try # files = open (filename, "R", encoding= "iso-8859-15") data = [] for line in fi Les.readlines (): item = Line.strip (). Split ("::") data.append (item) return data#-------------------------start-------- -----------------------start = Time.clock () Movies = ReadFile ("/home/hadoop/python/cf/movies.dat") ratings = ReadFile ( "/home/hadoop/python/cf/ratings.dat") demo = CF (Movies, ratings, k=20) demo.recommendbyuser ("+") print ("Recommended list:") Demo.showtable () print ("Processed data is%d"% (len (demo.ratings))) print ("Accuracy:%.2f%"% (Demo.cost *)) end = Time.clock () print ("Time-consuming:%f S"% (End-start))

Summarize

The above is the whole content of the complete code example of Python implementation collaborative filtering recommendation algorithm, we hope to help you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.