How to install and configure Apache mahout in Linux

Source: Internet
Author: User
Tags hadoop fs

After installing and configuring Apache mahout in Linux, I decided to share it with you because it took me six or seven hours to complete this job, I hope that the reader of this article will be able to install it within an hour. I think sharing is king, and I hope you can share it with you after solving any problems. Thank you!

First, please download this article from Baidu Library: (mahout installation Text Version)

Http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html

Follow the steps above to proceed step by step. If the system does not list algorithms when you use the command bin/mahout-help, it is because it runs on the environment variables hadoop_home and hadoop_conf_dir you configured, if you delete these two environment variables, run the command to get the desired result.

What you need to do is to test how you install it. here we will show you how to use a clustering algorithm:

In.

The following describes how to run successfully:

1: You go to the above page and you will see the following:

Pre-prep

Make sure you have the following covered before you work out the example.

  1. Input data set. Download it
    Here.

Click here to download the dataset synthetic_control.data. Put the dataset synthetic_control.data under the mahout_home directory. (Note: You must put the dataset in this directory; otherwise, an exception is reported)

2: Start hadoop: $ hadoop_home/bin/start-all.sh

3: Create the test directory testdata and import the data to the tastdata directory (the directory name here can only be testdata) $ hadoop_home/bin/hadoop FS-mkdir testdata
$ Hadoop_home/bin/hadoop FS-put <path to synthetic_control.data> testdata
4. Use the kmeans Algorithm
$ Hadoop_home/bin/hadoop jar $ mahout_home/mahout-examples-0.3.job org. Apache. mahout. Clustering. syntheticcontrol. kmeans. Job

It will take several minutes to run and be patient.

5. view the running result. Run the following commands in sequence:

$ Hadoop_home/bin/hadoop FS-LSR output

$ Hadoop_home/bin/hadoop FS-Get output $ mahout_home/Examples

Go to the output directory,

$ CD mahout_home/examples/Output

$ Ls

If the following result is displayed, the algorithm runs successfully and your installation is successful:

Canopies clusters-1 clusters-3 clusters-5 clusters-7 points
Clusters-0 clusters-2 clusters-4 clusters-6 data

I hope this will be useful to you.

Reference:

Https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data

Http://bbs.hadoopor.com/thread-983-1-1.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.