Mahout installation Configuration

Last Update:2014-08-03 Source: Internet

Author: User

Tags hadoop fs macbook

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mahout configuration took a lot of time, mainly because it wasted a lot of time on some small issues.

1. Download mahout

: Http://mahout.apache.org

The latest version I downloaded: mahout-distribution-0.9

2. Unzip mahout to the file you want to store. I put it in the/users/Jia/documents/hadoop-0.20.2, that is, the hadoop installation directory.

3. Configure the environment for mahout

Open the terminal and open the directory where the profile file is located

JIAS-MacBook-Pro:~ jia$ open /etc

Copy the profile file to the desktop, edit it, and add environment variables to it.

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Homeexport HADOOP_HOME=Documents/hadoop-0.20.2export MAHOUT_HOME=Documents/hadoop-0.20.2/mahout-distribution-0.9export MAVEN_HOME=Documents/apache-maven-3.2.2export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$MAHOUT_HOME/binexport HADOOP_CONF_DIR=Documents/hadoop-0.20.2/confexport MAHOUT_CONF_DIR=Documents/hadoop-0.20.2/mahout-distribution-0.9/confexport classpath=$classpath:$JAVA_HOME/lib:$MAHOUT_HOME/lib:$HADOOP_CONF_DIR:$MAHOUT_CONF_DIR

Overwrite the profile file on the desktop to the profile on/etc, and enter the administrator password.

Note:

When configuring mahou_conf_dir some websites say export mahout_conf_dir = documents/hadoop-0.20.2/mahout-distribution-0.9/src/Conf
The correct configuration for version 0.9 is: Export mahout_conf_dir = documents/hadoop-0.20.2/mahout-distribution-0.9/Conf, because when you open the mahout folder, you find that the directory SRC is not found

4. Check whether mahout is configured successfully.

4.1 start hadoop

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/start-all.sh

4.2 view mahout

JIAS-MacBook-Pro:mahout-distribution-0.9 jia$ bin/mahout MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locallyAn example program must be given as the first argument.Valid program names are:  arff.vector: : Generate Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump: : Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump: : Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes  fkmeans: : Fuzzy K-means clustering  hmmpredict: : Generate random sequence of observations by given HMM  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering  kmeans: : K-means clustering  lucene.vector: : Generate Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles from a Lucene index  matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product of two matrices  parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: : Runs clustering experiments and summarizes results in a CSV  recommendfactorized: : Compute recommendations using the factorization of a rating matrix  recommenditembased: : Compute recommendations using item-based collaborative filtering  regexconverter: : Convert text files on a per line basis based on regular expressions  resplit: : Splits a set of SequenceFiles into a number of equal splits  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: : Generate sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives  seqwiki: : Wikipedia xml dump to sequence file  spectralkmeans: : Spectral k-means clustering  split: : Split Input data into test and train sets  splitDataset: : split a rating dataset into training and probe parts  ssvd: : Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd: : Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model  trainlogistic: : Train a logistic regression using stochastic gradient descent  trainnb: : Train the Vector-based Bayes classifier  transpose: : Take the transpose of a matrix  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi: : Viterbi decoding of hidden states from given output states sequence

Here we need to explain that when you see the following code, you think it is wrong, but it is not because:

Mahout_local: Set whether to run locally. If this parameter is set, hadoop will not run. Once this parameter is set, the values of hadoop_conf_dir and hadoop_home are

The setting will automatically expire.

At the beginning, I had a long struggle on this issue.

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally

5. Run the mahout Algorithm

Download test data from 5.1 to the address below

Http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

5.2 create the test directory testdata and import the data to the tastdata directory.

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -mkdir testdata

5.3 upload the test data to HDFS. Instead of storing the test data in a document created on Mac using pages, create a new file command: Touch data

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -put workspace/data testdata/

5.4 run the kmeans Algorithm on mahout.

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop jar mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

5.5 view results

JIAS-MacBook-Pro:~ jia$ cd Documents/hadoop-0.20.2/JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -ls output/Found 15 items-rwxrwxrwx   1 jia staff        194 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/_policydrwxr-xr-x   - jia staff        136 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusteredPointsdrwxr-xr-x   - jia staff        544 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-0drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-1drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-10-finaldrwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-2drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-3drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-4drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-5drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-6drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-7drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-8drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-9drwxr-xr-x   - jia staff        136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/datadrwxr-xr-x   - jia staff        136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/random-seeds

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More