Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

Last Update:2017-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

excerpt from: http://blog.fens.me/mahout-recommendation-api/Test procedure: Recommendertest.java

Test Data set: Item.csv

1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.5

Test procedure: Org.conan.mymahout.recommendation.job.RecommenderTest.java

Package Org.conan.mymahout.recommendation.job;import Java.io.ioexception;import Java.util.list;import Org.apache.mahout.cf.taste.common.tasteexception;import Org.apache.mahout.cf.taste.eval.RecommenderBuilder; Import Org.apache.mahout.cf.taste.impl.common.longprimitiveiterator;import Org.apache.mahout.cf.taste.model.datamodel;import Org.apache.mahout.cf.taste.recommender.recommendeditem;import    Org.apache.mahout.common.randomutils;public class Recommendertest {final static int neighborhood_num = 2;    Final static int recommender_num = 3;        public static void Main (string[] args) throws Tasteexception, IOException {randomutils.usetestseed ();        String file = "Datafile/item.csv";        Datamodel Datamodel = Recommendfactory.builddatamodel (file);    Slopeone (Datamodel); } public static void Usercf (Datamodel datamodel) throws tasteexception{} public static void Itemcf (Datamodel datamod EL) throws tasteexception{} public static void Slopeone (Datamodel datamodeL) throws tasteexception{} ...

Each algorithm is a separate method for algorithmic testing, such as USERCF (), Itemcf (), Slopeone () ....

5. User-based collaborative filtering algorithm USERCF

Based on user's collaborative filtering, the similarity between users is evaluated by different users ' rating, and the recommendation is based on the similarity between users. To put it simply: recommend to users what other users like about him who are interested in the same things.

To illustrate:

The basic idea of the user-based CF is quite simple, based on the user's preference for the item to find the neighboring neighbor user, then the neighbor user likes the recommendation to the current user. In the calculation, it is a user's preference for all items as a vector to calculate the similarity between users, after finding K neighbors, according to the neighbor's similarity weight and their preference for items, predict the current user does not have a preference for items, calculate a sorted list of items as a recommendation. Figure 2 shows an example, for user A, based on the user's historical preferences, here only to get a neighbor-user C, and then the user C-like item D is recommended to user A.

Picture and explanatory text above, excerpt from: https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/

Algorithm API:org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender

  @Override Public float estimatepreference (long UserID, long ItemID) throws Tasteexception {Datamodel model = Getdat    AModel ();    Float actualpref = Model.getpreferencevalue (UserID, ItemID);    if (actualpref! = null) {return actualpref;    } long[] Theneighborhood = Neighborhood.getuserneighborhood (UserID);  Return Doestimatepreference (UserID, Theneighborhood, ItemID); } protected float Doestimatepreference (long Theuserid, long[] theneighborhood, long ItemID) throws Tasteexception {if (Theneighborhood.length = = 0)    {return Float.nan;    } Datamodel Datamodel = Getdatamodel ();    Double preference = 0.0;    Double totalsimilarity = 0.0;    int count = 0; for (long Userid:theneighborhood) {if (UserID! = Theuserid) {//See Genericitembasedrecommender.doestimate        Preference () too Float pref = Datamodel.getpreferencevalue (UserID, ItemID); if (pref! = null) {Double thesimilarity = similarity.usersimilarity (theuserid, userID); if (!            Double.isnan (thesimilarity)) {preference + = thesimilarity * PREF;            Totalsimilarity + = thesimilarity;          count++;    }}}}//Throw out the estimate if it is based on no data points, of course, and also if based on Just one.    This was a bit of a Band-Aid on the ' stock ' item-based algorithm for the moment. The reason is, and the estimate is, simply, the user's rating for one item//which happened to has a D efined similarity.    The similarity score doesn ' t matter, and that//seems as a bad situation.    if (count <= 1) {return float.nan;    } float estimate = (float) (preference/totalsimilarity);    if (capper! = null) {estimate = Capper.capestimate (estimate);  } return estimate; }

Test program:

    public static void Usercf (Datamodel datamodel) throws Tasteexception {usersimilarity usersimilarity = recommen        Dfactory.usersimilarity (RecommendFactory.SIMILARITY.EUCLIDEAN, Datamodel); Userneighborhood Userneighborhood = Recommendfactory.userneighborhood (RecommendFactory.NEIGHBORHOOD.NEAREST,        Usersimilarity, Datamodel, neighborhood_num);        Recommenderbuilder Recommenderbuilder = Recommendfactory.userrecommender (usersimilarity, UserNeighborhood, true); Recommendfactory.evaluate (RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderbuilder, NULL,        Datamodel, 0.7);        Recommendfactory.statsevaluator (Recommenderbuilder, NULL, Datamodel, 2);        Longprimitiveiterator iter = Datamodel.getuserids ();            while (Iter.hasnext ()) {Long uid = Iter.nextlong ();            List List = Recommenderbuilder.buildrecommender (Datamodel). Recommend (uid, recommender_num);        Recommendfactory.showitems (UID, list, true); }    }

Program output:

AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0Recommender IR Evaluator: [Precision:0.5,Recall:0.5]uid:1,(104,4.333333)(106,4.000000)uid:2,(105,4.049678)uid:3,(103,3.512787)(102,2.747869)uid:4,(102,3.000000)

Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mahout demo--is essentially a Hadoop-based step-up algorithm implementation, such as multi-node data merging, data sequencing, network communication efficiency, node downtime, data-step storage

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support