Mahout Learning mahout Introduction, installation, configuration, entry program testing

Source: Internet
Author: User
Tags hadoop fs

I. Introduction of Mahout

Check the Chinese meaning of mahout--the people, and then look at Mahout logo, well, want to play with the small yellow elephant happy, have to accompany the person to accompany the man to play a trick ...

Attached Logo:

(That's him, the mahout on the Elephant's head)

Step into the text:

Mahout is a powerful data mining tool that is a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance. machine learning algorithms implemented in Mahout:

Algorithm classes

Algorithm name

Chinese name

Classification algorithm

Logistic Regression

Logistic regression

Bayesian

Bayesian

Svm

Support Vector Machine

Perceptron

Perceptron algorithm

Neural Network

Neural network

Random forests

Random Forest

Restricted Boltzmann Machines

Finite-Boltzmann machine

Clustering algorithm

Canopy Clustering

Canopy Clustering

K-means Clustering

K-mean-value algorithm

Fuzzy K-means

Fuzzy K-Mean value

Expectation maximization

EM clustering (expected maximum clustering)

Mean Shift Clustering

Mean Drift Clustering

Hierarchical clustering

Hierarchical clustering

Dirichlet Process Clustering

Dirichlet process Clustering

Latent Dirichlet Allocation

LDA Clustering

Spectral clustering

Spectral clustering

Mining Association Rules

Parallel FP Growth algorithm

Parallel FP growth algorithm

Regression

Locally Weighted Linear Regression

Local weighted linear regression

dimensionality Reduction/Vieux-

Singular Value decomposition

Singular value decomposition

Principal Components Analysis

Principal component Analysis

Independent Component Analysis

Independent component Analysis

Gaussian discriminative Analysis

Gaussian discriminant analysis

Evolutionary algorithms

Parallelization of the Watchmaker framework

Recommended/Collaborative filtering

Non-distributed recommenders

Taste (USERCF, ITEMCF, Slopeone)

Distributed recommenders

Itemcf

Calculation of vector similarity

Rowsimilarityjob

Calculate the similarity between columns

Vectordistancejob

Calculate distance between vectors

Non-map-reduce algorithm

Hidden Markov Models

Hidden Markov model

Collection method Extension

Collections

Extends Java's collections class

Second, mahout installation, configuration

First, download Mahouthttp://archive.apache.org/dist/mahout/ Second, decompressionTAR-ZXVF mahout-distribution-0.9.tar.gz Third, configure environment variables3.1. Configure MAHOUT environment variable # set Mahout environmentexport mahout_home=/home/yujianxin/mahout/mahout-distribution-0.9export mahout_conf_dir= $MAHOUT _home/confexport path= $MAHOUT _home/conf: $MAHOUT _home/bin: $PATH 3.2, configure the required Hadoop environment variables for MAHOUT # Set Hadoop environmentexport hadoop_home=/home/yujianxin/hadoop/hadoop-1.1.2
Export hadoop_conf_dir= $HADOOP _home/conf
Export path= $PATH: $HADOOP _home/binexport hadoop_home_warn_suppress=not_null   Iv. Verifying that the mahout is installed successfullyExecutes the command mahout. If you list some algorithms, you are successful, v. Use of mahout for entry-level use5.1, start Hadoop5.2, download test data http://archive.ics.uci.edu/ml/databases/synthetic_control/link in the Synthetic_control.data 5.3,  Uploading test data Hadoop fs-put synthetic_control.data/user/root/testdata5.4 uses the Kmeans clustering algorithm in Mahout to execute commands: Mahout-core Org.apache.mahout.clustering.syntheticcontrol.kmeans.Job takes about 9 minutes to complete clustering.  5.5 View clustering results perform Hadoop fs-ls/user/root/output to view clustering results. To live, to work. Mahout continue to learn ...

Mahout Learning mahout Introduction, installation, configuration, entry program testing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.