Mahout Introductory Guide to the Mahout stand-alone recommendation algorithm
I recently in the study of Mahout, online to find some information on the entry, found that the collation of the more chaotic. Toss a few, and finally got it clear. To get beginners started faster, decide to summarize and share and write this introductory guide.
What is Mahout?
Mahout is a machine learning library that implements a number of algorithms, such as recommended algorithms and clustering algorithms.
There are standalone memory editions and distributed (Hadoop and Spark) implementations.
Mahout How to get started quickly?
Personally feel that the single version of the Mahout recommendation system demo is more suitable for beginners. Some of the online information is actually a stand-alone version of the algorithm, but those materials are to configure a lot of "unnecessary" environment, let people blindly toss, in fact, as long as a simple configuration, understand the core part on the line. At the end of the article I'll show you how to run a standalone version of the recommended algorithm.
What is the Mahout learning order?
1 familiar with stand-alone version Mahout recommendation algorithm
Learn about the Mahout referral process (based on the user as an example):
Datamodel read input data, can be text, can be a database (such as MySQL),
Define the similarity algorithm, define the algorithm of the neighboring users, the above three to recommender processing recommendation.
2 familiarity with distributed version recommendation algorithms (Hadoop or Spark)
Different from the standalone version: the input and output are in HDFs, the operation is distributed.
3 familiar with other machine learning algorithms (classification, clustering, etc.)
4 Developing your own distributed algorithms on Mahout
How to run a stand-alone version of the recommended algorithm?
The recommended algorithm for running here is a stand-alone version of User-based's collaborative filtering algorithm.
User-based is based on the user's recommendation algorithm, popular, is to recommend similar user preferences items. For example, to User a referral, first find User A similar users, if take TOP1, find User B, and then User B liked but user a did not contact the items recommended to user A.
There are many kinds of recommended algorithms that you can learn online. My humble opinion is from the book "Big Data-Internet large-scale data mining and distributed processing".
Well, finally to run the code link. There are actually two ways.
1 Download Mahout, extract the various jar packages, the Java program calls these jar functions directly
2 Install MAVEN, define mahout dependencies, and maven will automatically download the jar package
The most convenient is method 1, the following explanation is also method 1.
Pre-installation Environment:
1 Java Environment
2 Eclipse
Note that the standalone version of the mahout algorithm does not require a hadoop environment.
The above two environment installation here does not explain, we find online.
Environment after installation, to mahout official website, download mahout, unzip, you can see a variety of jar package.
Create a new Java project in Eclipse and import some jar packages via the Add external jars.
To prepare the test data:
Create a text file, store the user ID, item ID, score, and save the file as Dataset.csv.
The first column is UserID, the second column is Itemid, and the third column preference Value is the rating.
1,10,1.0
1,11,2.0
1,12,5.0
1,13,5.0
1,14,5.0
1,15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
3,12,4.5
3,13,4.0
3,14,3.0
3,15,3.5
3,16,4.5
3,17,4.0
3,18,5.0
4,10,5.0
4,11,5.0
4,12,5.0
4,13,0.0
4,14,2.0
4,15,3.0
4,16,1.0
4,17,4.0
4,18,1.0
Code:
Import Java.io.file;import Java.io.ioexception;import Java.util.list;import Org.apache.mahout.cf.taste.common.tasteexception;import Org.apache.mahout.cf.taste.impl.model.file.filedatamodel;import Org.apache.mahout.cf.taste.impl.neighborhood.thresholduserneighborhood;import Org.apache.mahout.cf.taste.impl.recommender.genericuserbasedrecommender;import Org.apache.mahout.cf.taste.impl.similarity.pearsoncorrelationsimilarity;import Org.apache.mahout.cf.taste.model.datamodel;import Org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; Import Org.apache.mahout.cf.taste.recommender.recommendeditem;import Org.apache.mahout.cf.taste.recommender.userbasedrecommender;import Org.apache.mahout.cf.taste.similarity.usersimilarity;public class Recuserbasedexample {public static void main ( String[] args) throws IOException, tasteexception {//TODO auto-generated method Stubdatamodel model = new Filedatamodel (n EW File ("/home/linger/j2ee-workspace/linger/data/dataset.csv")); Usersimilarity similarity =New pearsoncorrelationsimilarity (model); Userneighborhood neighborhood = new Thresholduserneighborhood (0.1, similarity, model); Userbasedrecommender recommender = new Genericuserbasedrecommender (model, neighborhood, similarity);//If we wanted to Get three items recommended for the user with UserID 2, we would does it like This:list<recommendeditem> recommendatio NS = Recommender.recommend (2, 3); for (Recommendeditem recommendation:recommendations) {System.out.println ( recommendation);}}}
Run the code to see the recommended results.
Reference:
Http://mahout.apache.org/users/recommender/userbased-5-minutes.html
Mahout Website Tutorial: User-based Referral demo
http://blog.fens.me/hadoop-mahout-maven-eclipse/
withMavenBuildMahoutProject
Http://blog.sina.com.cn/s/blog_6dc9c7cb0101bmch.html
Eclipse Run Mahout Environment Building and sample code (a bit of pit, the article said to first configure Hadoop, in fact, a single machine can run. )
http://www.ibm.com/developerworks/cn/java/j-lo-mahout/(MySQL store data, tomcat web interaction)
This article linger
This article link: http://blog.csdn.net/lingerlanlan/article/details/41775509
Mahout Introductory Guide to the Mahout stand-alone recommendation algorithm