Mahout installation Configuration

Source: Internet
Author: User
Tags hadoop fs macbook

Mahout configuration took a lot of time, mainly because it wasted a lot of time on some small issues.

1. Download mahout

: Http://mahout.apache.org

The latest version I downloaded: mahout-distribution-0.9

2. Unzip mahout to the file you want to store. I put it in the/users/Jia/documents/hadoop-0.20.2, that is, the hadoop installation directory.

3. Configure the environment for mahout

Open the terminal and open the directory where the profile file is located

JIAS-MacBook-Pro:~ jia$ open /etc

Copy the profile file to the desktop, edit it, and add environment variables to it.

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Homeexport HADOOP_HOME=Documents/hadoop-0.20.2export MAHOUT_HOME=Documents/hadoop-0.20.2/mahout-distribution-0.9export MAVEN_HOME=Documents/apache-maven-3.2.2export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$MAHOUT_HOME/binexport HADOOP_CONF_DIR=Documents/hadoop-0.20.2/confexport MAHOUT_CONF_DIR=Documents/hadoop-0.20.2/mahout-distribution-0.9/confexport classpath=$classpath:$JAVA_HOME/lib:$MAHOUT_HOME/lib:$HADOOP_CONF_DIR:$MAHOUT_CONF_DIR

Overwrite the profile file on the desktop to the profile on/etc, and enter the administrator password.

Note:

When configuring mahou_conf_dir some websites say export mahout_conf_dir = documents/hadoop-0.20.2/mahout-distribution-0.9/src/Conf
The correct configuration for version 0.9 is: Export mahout_conf_dir = documents/hadoop-0.20.2/mahout-distribution-0.9/Conf, because when you open the mahout folder, you find that the directory SRC is not found

4. Check whether mahout is configured successfully.

4.1 start hadoop

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/start-all.sh 

4.2 view mahout

JIAS-MacBook-Pro:mahout-distribution-0.9 jia$ bin/mahout MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locallyAn example program must be given as the first argument.Valid program names are:  arff.vector: : Generate Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump: : Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump: : Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes  fkmeans: : Fuzzy K-means clustering  hmmpredict: : Generate random sequence of observations by given HMM  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering  kmeans: : K-means clustering  lucene.vector: : Generate Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles from a Lucene index  matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product of two matrices  parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: : Runs clustering experiments and summarizes results in a CSV  recommendfactorized: : Compute recommendations using the factorization of a rating matrix  recommenditembased: : Compute recommendations using item-based collaborative filtering  regexconverter: : Convert text files on a per line basis based on regular expressions  resplit: : Splits a set of SequenceFiles into a number of equal splits  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: : Generate sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives  seqwiki: : Wikipedia xml dump to sequence file  spectralkmeans: : Spectral k-means clustering  split: : Split Input data into test and train sets  splitDataset: : split a rating dataset into training and probe parts  ssvd: : Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd: : Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model  trainlogistic: : Train a logistic regression using stochastic gradient descent  trainnb: : Train the Vector-based Bayes classifier  transpose: : Take the transpose of a matrix  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi: : Viterbi decoding of hidden states from given output states sequence

Here we need to explain that when you see the following code, you think it is wrong, but it is not because:

Mahout_local: Set whether to run locally. If this parameter is set, hadoop will not run. Once this parameter is set, the values of hadoop_conf_dir and hadoop_home are

The setting will automatically expire.

At the beginning, I had a long struggle on this issue.

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally

5. Run the mahout Algorithm

Download test data from 5.1 to the address below

Http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

5.2 create the test directory testdata and import the data to the tastdata directory.

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -mkdir testdata

5.3 upload the test data to HDFS. Instead of storing the test data in a document created on Mac using pages, create a new file command: Touch data

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -put workspace/data testdata/

5.4 run the kmeans Algorithm on mahout.

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop jar mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

5.5 view results

JIAS-MacBook-Pro:~ jia$ cd Documents/hadoop-0.20.2/JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -ls output/Found 15 items-rwxrwxrwx   1 jia staff        194 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/_policydrwxr-xr-x   - jia staff        136 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusteredPointsdrwxr-xr-x   - jia staff        544 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-0drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-1drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-10-finaldrwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-2drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-3drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-4drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-5drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-6drwxr-xr-x   - jia staff        204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-7drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-8drwxr-xr-x   - jia staff        204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-9drwxr-xr-x   - jia staff        136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/datadrwxr-xr-x   - jia staff        136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/random-seeds

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.