excerpt from: http://blog.fens.me/mahout-recommendation-api/Test procedure: Recommendertest.java
Test Data set: Item.csv
1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.5
Test procedure: Org.conan.mymahout.recommendation.job.RecommenderTest.java
Package Org.conan.mymahout.recommendation.job;import Java.io.ioexception;import Java.util.list;import Org.apache.mahout.cf.taste.common.tasteexception;import Org.apache.mahout.cf.taste.eval.RecommenderBuilder; Import Org.apache.mahout.cf.taste.impl.common.longprimitiveiterator;import Org.apache.mahout.cf.taste.model.datamodel;import Org.apache.mahout.cf.taste.recommender.recommendeditem;import Org.apache.mahout.common.randomutils;public class Recommendertest {final static int neighborhood_num = 2; Final static int recommender_num = 3; public static void Main (string[] args) throws Tasteexception, IOException {randomutils.usetestseed (); String file = "Datafile/item.csv"; Datamodel Datamodel = Recommendfactory.builddatamodel (file); Slopeone (Datamodel); } public static void Usercf (Datamodel datamodel) throws tasteexception{} public static void Itemcf (Datamodel datamod EL) throws tasteexception{} public static void Slopeone (Datamodel datamodeL) throws tasteexception{} ...
Each algorithm is a separate method for algorithmic testing, such as USERCF (), Itemcf (), Slopeone () ....
5. User-based collaborative filtering algorithm USERCF
Based on user's collaborative filtering, the similarity between users is evaluated by different users ' rating, and the recommendation is based on the similarity between users. To put it simply: recommend to users what other users like about him who are interested in the same things.
To illustrate:
The basic idea of the user-based CF is quite simple, based on the user's preference for the item to find the neighboring neighbor user, then the neighbor user likes the recommendation to the current user. In the calculation, it is a user's preference for all items as a vector to calculate the similarity between users, after finding K neighbors, according to the neighbor's similarity weight and their preference for items, predict the current user does not have a preference for items, calculate a sorted list of items as a recommendation. Figure 2 shows an example, for user A, based on the user's historical preferences, here only to get a neighbor-user C, and then the user C-like item D is recommended to user A.
Picture and explanatory text above, excerpt from: https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/
Algorithm API:org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender
@Override Public float estimatepreference (long UserID, long ItemID) throws Tasteexception {Datamodel model = Getdat AModel (); Float actualpref = Model.getpreferencevalue (UserID, ItemID); if (actualpref! = null) {return actualpref; } long[] Theneighborhood = Neighborhood.getuserneighborhood (UserID); Return Doestimatepreference (UserID, Theneighborhood, ItemID); } protected float Doestimatepreference (long Theuserid, long[] theneighborhood, long ItemID) throws Tasteexception {if (Theneighborhood.length = = 0) {return Float.nan; } Datamodel Datamodel = Getdatamodel (); Double preference = 0.0; Double totalsimilarity = 0.0; int count = 0; for (long Userid:theneighborhood) {if (UserID! = Theuserid) {//See Genericitembasedrecommender.doestimate Preference () too Float pref = Datamodel.getpreferencevalue (UserID, ItemID); if (pref! = null) {Double thesimilarity = similarity.usersimilarity (theuserid, userID); if (! Double.isnan (thesimilarity)) {preference + = thesimilarity * PREF; Totalsimilarity + = thesimilarity; count++; }}}}//Throw out the estimate if it is based on no data points, of course, and also if based on Just one. This was a bit of a Band-Aid on the ' stock ' item-based algorithm for the moment. The reason is, and the estimate is, simply, the user's rating for one item//which happened to has a D efined similarity. The similarity score doesn ' t matter, and that//seems as a bad situation. if (count <= 1) {return float.nan; } float estimate = (float) (preference/totalsimilarity); if (capper! = null) {estimate = Capper.capestimate (estimate); } return estimate; }
Test program:
public static void Usercf (Datamodel datamodel) throws Tasteexception {usersimilarity usersimilarity = recommen Dfactory.usersimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, Datamodel); Userneighborhood Userneighborhood = Recommendfactory.userneighborhood (RecommendFactory.NEIGHBORHOOD.NEAREST, Usersimilarity, Datamodel, neighborhood_num); Recommenderbuilder Recommenderbuilder = Recommendfactory.userrecommender (usersimilarity, UserNeighborhood, true); Recommendfactory.evaluate (RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderbuilder, NULL, Datamodel, 0.7); Recommendfactory.statsevaluator (Recommenderbuilder, NULL, Datamodel, 2); Longprimitiveiterator iter = Datamodel.getuserids (); while (Iter.hasnext ()) {Long uid = Iter.nextlong (); List List = Recommenderbuilder.buildrecommender (Datamodel). Recommend (uid, recommender_num); Recommendfactory.showitems (UID, list, true); } }
Program output:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0Recommender IR Evaluator: [Precision:0.5,Recall:0.5]uid:1,(104,4.333333)(106,4.000000)uid:2,(105,4.049678)uid:3,(103,3.512787)(102,2.747869)uid:4,(102,3.000000)
Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage