Configuration:
Maven: Download, configuration, used to compile mahout In the mahout directory MVN install
Eclipse: Import jars and compile the test example.
Hadoop: distributed
Mahout: Download, configure/etc/profile
Recommendation System instance:
1. Create a Java project and a new class test
2. Reference: http://blog.csdn.net/aidayei/article/details/6626699
Package Org. Apache. mahout. FPM. pfpgrowth;
Import Org. Apache. mahout. Cf. Taste. impl. model. File. * ;
Import Org. Apache. mahout. Cf. Taste. impl. Neighborhood. * ;
Import Org. Apache. mahout. Cf. Taste. impl. recommender. * ;
Import Org. Apache. mahout. Cf. Taste. impl. similarity. * ;
Import Org. Apache. mahout. Cf. Taste. model. * ;
Import Org. Apache. mahout. Cf. Taste. Neighborhood. * ;
Import Org. Apache. mahout. Cf. Taste. recommender. * ;
Import Org. Apache. mahout. Cf. Taste. similarity. * ;
Import Java. Io. * ;
Import Java. util. * ;
Public Class Test {
Private Test (){};
Public Static Void Main (string ARGs []) Throws Exception {
// Step: 1 build model 2 calculate similarity 3 search K close to 4 construct recommendation engine
Datamodel Model = New Filedatamodel ( New File ( " /Usr/hadoop/testdata/cf.txt " )); // The file name must be an absolute path.
Usersimilarity Similarity = New Pearsoncorrelationsimilarity (model );
Userneighborhood neighborhood = New Nearestnuserneighborhood ( 2 , Similarity, model );
Recommender recommender = New Genericuserbasedrecommender (model, neighborhood, similarity );
List < Recommendeditem > Recommendations = Recommender. Recommend ( 1 , 2 ); // Two itemids are recommended for user 1.
For (Recommendeditem recommendatf0 \}
Data preparation: test.txt
The first column is userid, the second column is Itemid, and the third column is preference value.
1 , 101 , 5
1 , 102 , 3
1 , 103 , 2.5
2 , 101 , 2
2 , 102 , 2.5
2 , 103 , 5
2 , 104 , 2
3 , 101 , 2.5
3 , 104 , 4
3 , 105 , 4.5
3 , 107 , 5
4 , 101 , 5
4 , 103 , 3
4 , 104 , 4.5
4 , 106 , 4
5 , 101 , 4
5 , 102 , 3
5 , 103 , 2
5 , 104 , 4
5 , 105 , 3.5
5 , 106 , 4
Output:
Recommendeditem [item:104, Value:4.257081]
Recommendeditem [item:106, Value:4.0]
Resource reference
1. Configuration: http://blog.csdn.net/chjshan55/article/details/5923646. Senior: http://hi.baidu.com/czb_xyls/blog/item/76019d02cfa3cd101c95833a.html
2. Test and read the files in HDFS. It was originally serialized to HDFS and should be read using commands: Bin/mahout vectordump -- seqfile/user/hadoopuser/output/data/part-00000
Problem:
1. What command is used to read files from HDFS to a local file?
2.AlgorithmHow is internal execution performed? How does recommendation system parallelization work?