Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

Source: Internet
Author: User

excerpt from: http://blog.fens.me/mahout-recommendation-api/Test procedure: Recommendertest.java

Test Data set: Item.csv

1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.5

Test procedure: Org.conan.mymahout.recommendation.job.RecommenderTest.java

Package Org.conan.mymahout.recommendation.job;import Java.io.ioexception;import Java.util.list;import Org.apache.mahout.cf.taste.common.tasteexception;import Org.apache.mahout.cf.taste.eval.RecommenderBuilder; Import Org.apache.mahout.cf.taste.impl.common.longprimitiveiterator;import Org.apache.mahout.cf.taste.model.datamodel;import Org.apache.mahout.cf.taste.recommender.recommendeditem;import    Org.apache.mahout.common.randomutils;public class Recommendertest {final static int neighborhood_num = 2;    Final static int recommender_num = 3;        public static void Main (string[] args) throws Tasteexception, IOException {randomutils.usetestseed ();        String file = "Datafile/item.csv";        Datamodel Datamodel = Recommendfactory.builddatamodel (file);    Slopeone (Datamodel); } public static void Usercf (Datamodel datamodel) throws tasteexception{} public static void Itemcf (Datamodel datamod EL) throws tasteexception{} public static void Slopeone (Datamodel datamodeL) throws tasteexception{} ... 

Each algorithm is a separate method for algorithmic testing, such as USERCF (), Itemcf (), Slopeone () ....

5. User-based collaborative filtering algorithm USERCF

Based on user's collaborative filtering, the similarity between users is evaluated by different users ' rating, and the recommendation is based on the similarity between users. To put it simply: recommend to users what other users like about him who are interested in the same things.

To illustrate:

The basic idea of the user-based CF is quite simple, based on the user's preference for the item to find the neighboring neighbor user, then the neighbor user likes the recommendation to the current user. In the calculation, it is a user's preference for all items as a vector to calculate the similarity between users, after finding K neighbors, according to the neighbor's similarity weight and their preference for items, predict the current user does not have a preference for items, calculate a sorted list of items as a recommendation. Figure 2 shows an example, for user A, based on the user's historical preferences, here only to get a neighbor-user C, and then the user C-like item D is recommended to user A.

Picture and explanatory text above, excerpt from: https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/

Algorithm API:org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender

  @Override Public float estimatepreference (long UserID, long ItemID) throws Tasteexception {Datamodel model = Getdat    AModel ();    Float actualpref = Model.getpreferencevalue (UserID, ItemID);    if (actualpref! = null) {return actualpref;    } long[] Theneighborhood = Neighborhood.getuserneighborhood (UserID);  Return Doestimatepreference (UserID, Theneighborhood, ItemID); } protected float Doestimatepreference (long Theuserid, long[] theneighborhood, long ItemID) throws Tasteexception {if (Theneighborhood.length = = 0)    {return Float.nan;    } Datamodel Datamodel = Getdatamodel ();    Double preference = 0.0;    Double totalsimilarity = 0.0;    int count = 0; for (long Userid:theneighborhood) {if (UserID! = Theuserid) {//See Genericitembasedrecommender.doestimate        Preference () too Float pref = Datamodel.getpreferencevalue (UserID, ItemID); if (pref! = null) {Double thesimilarity = similarity.usersimilarity (theuserid, userID); if (!            Double.isnan (thesimilarity)) {preference + = thesimilarity * PREF;            Totalsimilarity + = thesimilarity;          count++;    }}}}//Throw out the estimate if it is based on no data points, of course, and also if based on Just one.    This was a bit of a Band-Aid on the ' stock ' item-based algorithm for the moment. The reason is, and the estimate is, simply, the user's rating for one item//which happened to has a D efined similarity.    The similarity score doesn ' t matter, and that//seems as a bad situation.    if (count <= 1) {return float.nan;    } float estimate = (float) (preference/totalsimilarity);    if (capper! = null) {estimate = Capper.capestimate (estimate);  } return estimate; }

Test program:

    public static void Usercf (Datamodel datamodel) throws Tasteexception {usersimilarity usersimilarity = recommen        Dfactory.usersimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, Datamodel); Userneighborhood Userneighborhood = Recommendfactory.userneighborhood (RecommendFactory.NEIGHBORHOOD.NEAREST,        Usersimilarity, Datamodel, neighborhood_num);        Recommenderbuilder Recommenderbuilder = Recommendfactory.userrecommender (usersimilarity, UserNeighborhood, true); Recommendfactory.evaluate (RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderbuilder, NULL,        Datamodel, 0.7);        Recommendfactory.statsevaluator (Recommenderbuilder, NULL, Datamodel, 2);        Longprimitiveiterator iter = Datamodel.getuserids ();            while (Iter.hasnext ()) {Long uid = Iter.nextlong ();            List List = Recommenderbuilder.buildrecommender (Datamodel). Recommend (uid, recommender_num);        Recommendfactory.showitems (UID, list, true); }    }

Program output:

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0Recommender IR Evaluator: [Precision:0.5,Recall:0.5]uid:1,(104,4.333333)(106,4.000000)uid:2,(105,4.049678)uid:3,(103,3.512787)(102,2.747869)uid:4,(102,3.000000)

Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.