1. Version and Installation path
Ubuntu 14.04
mahout_home=/opt/mahout-0.10.1
Hadoop_home=/usr/local/hadoop
mavent_home=/opt/apache-mavent-3.3.3
Hadoop version=2.6.0
Mahout version=0.10.1
Mavent version=3.3.3
2.Mahout Recompile
Mahout Download: http://archive.apache.org/dist/mahout/
Need to recompile when used on Hadoop above version 2.0
$ git clone https://github.com/apache/mahout.git$ mvn Clean package-dhadoop2-dhadoop2.version=2.6.0-dskiptests=trueCompiled after the compilation is complete\mahout\examples\target\mahout-examples-snapshot-0.10.1.jar\mahout\examples\target\mahout-examples-snapshot-0.10.1-job.jarreplace Mahout-examples-0.10.1.jar in Mahout directory, mahout-examples-0.10.1-job.jar two files3. Environment Variables
sudo gedit ~/.BASHRC
#MahoutHADOOP_HOME =/usr/local/hadoophadoop_conf_dir= $HADOOP _home/etc/hadoopmahout_home=/opt/ mahout-0.10.1mahout_conf_dir= $MAHOUT _home/confpath= $PATH: $HADOOP _home/bin: $MAHOUT _home/bin#mavenmaven_home=/ Opt/apache-maven-3.3.3export Maven_homeexport Path=${path}:${maven_home}/bin
The installation path should be consistent with your own
environment variable changes take effect immediately:
SOURCE ~/.BASHRC
Run the command under the Mahout installation path: Mahout, the installation is successful.
4.kmeans Simple Operation
Download test Data Set Synthetic_control.data
http://archive.ics.uci.edu/ml/databases/synthetic_control/
Create the TestData directory in HDFs, it must be the testdata directory! And every time you run Hadoop, delete the original output directory!
Bin/hadoop fs-mkdir-p TestData
Uploading to the testdata directory in HDFs
Hadoop fs-copyfromlocal/home/hadoop/Desktop/synthetic_control.data testdata
Start Kmeans in the Mahout installation directory
Mahout Org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Results:
To view the output directory:
Under Eclipse
Mahout 0.10.1 Installation (Hadoop2.6.0) and Kmeans test