Mahout is a powerful data mining tool that is a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance.
Download Mahout, the version i downloaded is mahout0.9:mahout-distribution-0.9.tar.gz
Extract:
Rename:
To configure environment variables:
Use the command source/profile to make the environment variable effective immediately:
Verify that the mahout is installed successfully: Enter Myhout, if some algorithms are listed, the successful
Run a mahout instance, download test data from the website, save as Synthetic_control.data.txt format
Start Hadoop, create a folder on HDFs testdata (must be named TestData)
Then upload the synthetic_control.data.txt to TestData
Running the K-means algorithm, the algorithm launches 10 MapReduce job tasks
The following is the result of running the build:
To view the output directory:
Finish
Installation and configuration of Mahout