Mahout is an open-source software designed to provide scalability algorithms for actual problems.
Official homepage: http://mahout.apache.org/
Quickstart: https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart
The current version is 0.4. This example shows how to configure and apply mahout to your program in eclipse.
Environment: Eclipse + Maven (m2eclipse) + mahout 0.4 + JDK 1.6
Configuration:
Step 1:
Create a Maven program in eclipse and select "Maven-Archetype-Quickstart" in "select an archetype.
Step 2:
Open Pom. xml and add some necessary jar files.
Click the dependencies tab under Pom. xml and click Add in dependencies. In the pop-up dialog box, enter "mahout". Wait a moment and many jar packages will be generated. Select the appropriate mahout package. Generally, if you do a very simple program, select mahout-core. If distributed computing is required, add hadoop.
Save Pom. xml. This program will automatically download the jar package you selected.
Instance:
Let's illustrate it with an example:
Create a new class and write the following code:
Import Org. apache. mahout. cf. taste. impl. model. file. *; <br/> Import Org. apache. mahout. cf. taste. impl. neighborhood. *; <br/> Import Org. apache. mahout. cf. taste. impl. recommender. *; <br/> Import Org. apache. mahout. cf. taste. impl. similarity. *; <br/> Import Org. apache. mahout. cf. taste. model. *; <br/> Import Org. apache. mahout. cf. taste. neighborhood. *; <br/> Import Org. apache. mahout. cf. taste. recommender. *; <br/> Import Org. apache. mahout. cf. taste. similarity. *; <br/> Import Java. io. *; <br/> Import Java. util. *; <br/> public class recommenderintro {<br/> private recommenderintro () {}; </P> <p> Public static void main (string ARGs []) throws exception {<br/> // step: 1 build model 2 calculate similarity 3 search K close to 4 construct recommendation engine <br/> datamodel model = new filedatamodel (new file ("Data/intro.csv ")); <br/> usersimilarity similarity = new pearsoncorrelationsimilarity (model); <br/> userneighborhood neighborhood = new nearestnuserneighborhood (2, similarity, model ); <br/> recommender = new genericuserbasedrecommender (model, neighborhood, similarity); <br/> List <recommendeditem> recommendations = recommender. recommend (1, 2); <br/> for (recommendeditem recommendation: Recommendations) {<br/> system. out. println (recommendation); <br/>}</P> <p >}< br/>
Running result:
Recommendeditem [item: 104, value: 4.257081]
Recommendeditem [item: 106, value: 4.0]
The format of intro.csv is as follows:
The first column is userid, the second column is Itemid, and the third column is preference value, which means the score is <br/> 1,101, 5 <br/> 1,102, 3 <br/> 1,103, 2.5 <br/> 2,101, 2 <br/> 2,102, 2.5 <br/> 2,103, 5 <br/> 2,104, 2 <br/> 3,101, 2.5 <br/> 3,104, 4 <br/> 3,105, 4.5 <br/> 3,107, 5 <br/> 4,101, 5 <br/> 4,103, 3 <br/> 4,104, 4.5 <br/> 4,106, 4 <br/> 5,101, 4 <br/> 5,102, 3 <br/> 5,103, 2 <br/> 5,104, 4 <br/> 5,105, 3.5 <br/> 5,106, 4
The Pom. xml file is as follows:
<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance" <br/> xsi: schemalocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <br/> <modelversion> 4.0.0 </modelversion> <br/> <groupid> zhzhl_zju </groupid> <br/> <artifactid> mahout </artifactid> <br/> <version> 0.0.1-Snapshot </version> <br/> <packaging> jar </packaging> <br/> <Name> mahout </Name> <br/> <URL> http://maven.apache.org </URL> <br/> <Properties> <br/> <project. build. sourceencoding> UTF-8 </project. build. sourceencoding> <br/> </Properties> <br/> <dependencies> <br/> <dependency> <br/> <groupid> JUnit </groupid> <br/> <artifactid> JUnit </artifactid> <br/> <version> 3.8.1 </version> <br/> <scope> test </scope> <br/> </dependency> <br/> <dependency> <br/> <groupid> Org. apache. mahout </groupid> <br/> <artifactid> mahout-core </artifactid> <br/> <version> 0.4 </version> <br/> <type> jar </type> <br/> <scope> compile </scope> <br/> </dependency> <br/> </dependencies> <br/> </Project> <br/>