After installing and configuring Apache mahout in Linux, I decided to share it with you because it took me six or seven hours to complete this job, I hope that the reader of this article will be able to install it within an hour. I think sharing is king, and I hope you can share it with you after solving any problems. Thank you!
First, please download this article from Baidu Library: (mahout installation Text Version)
Http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html
Follow the steps above to proceed step by step. If the system does not list algorithms when you use the command bin/mahout-help, it is because it runs on the environment variables hadoop_home and hadoop_conf_dir you configured, if you delete these two environment variables, run the command to get the desired result.
What you need to do is to test how you install it. here we will show you how to use a clustering algorithm:
In.
The following describes how to run successfully:
1: You go to the above page and you will see the following:
Pre-prep
Make sure you have the following covered before you work out the example.
- Input data set. Download it
Here.
Click here to download the dataset synthetic_control.data. Put the dataset synthetic_control.data under the mahout_home directory. (Note: You must put the dataset in this directory; otherwise, an exception is reported)
2: Start hadoop: $ hadoop_home/bin/start-all.sh
3: Create the test directory testdata and import the data to the tastdata directory (the directory name here can only be testdata) $ hadoop_home/bin/hadoop FS-mkdir testdata
$ Hadoop_home/bin/hadoop FS-put <path to synthetic_control.data> testdata
4. Use the kmeans Algorithm
$ Hadoop_home/bin/hadoop jar $ mahout_home/mahout-examples-0.3.job org. Apache. mahout. Clustering. syntheticcontrol. kmeans. Job
It will take several minutes to run and be patient.
5. view the running result. Run the following commands in sequence:
$ Hadoop_home/bin/hadoop FS-LSR output
$ Hadoop_home/bin/hadoop FS-Get output $ mahout_home/Examples
Go to the output directory,
$ CD mahout_home/examples/Output
$ Ls
If the following result is displayed, the algorithm runs successfully and your installation is successful:
Canopies clusters-1 clusters-3 clusters-5 clusters-7 points
Clusters-0 clusters-2 clusters-4 clusters-6 data
I hope this will be useful to you.
Reference:
Https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
Http://bbs.hadoopor.com/thread-983-1-1.html