Hadoop (13), hadoop

Source: Internet
Author: User
Tags svm hadoop fs

Hadoop (13), hadoop

1. mahout introduction:

Mahout is a powerful data mining tool and a collection of distributed machine learning algorithms, including the implementation, classification, and clustering of distributed collaborative filtering called Taste. The biggest advantage of Mahout is its hadoop-based implementation, which converts many previous algorithms running on a single machine into the MapReduce mode, which greatly improves the data size and processing performance that the algorithm can process.

The machine learning algorithm implemented in mahout is as follows:

Algorithm

Algorithm name

Chinese name

Classification Algorithm

Logistic Regression

Logistic Regression

Bayesian

Bayes

SVM

SVM

Perceptron

Sensor Algorithm

Neural Network

Neural Network

Random Forests

Random Forest

Restricted Boltzmann Machines

Limited Polman Machine

Clustering Algorithm

Canopy Clustering

Canopy Clustering

K-means Clustering

K-means algorithm

Fuzzy K-means

Fuzzy K-means

Expectation Maximization

EM clustering (expectation maximization clustering)

Mean Shift Clustering

Mean Shift Clustering

Hierarchical Clustering

Hierarchical Clustering

Dirichlet Process Clustering

Dirichlet process Clustering

Latent Dirichlet Allocation

LDA Clustering

Spectral Clustering

Spectral clustering

Association Rule Mining

Parallel FP Growth Algorithm

Parallel FP Growth algorithm

Regression

Locally Weighted Linear Regression

Local Weighted Linear Regression

Dimension Reduction/Dimension Reduction

Singular Value Decomposition

Singular Value Decomposition

Principal Components Analysis

Principal Component Analysis

Independent Component Analysis

Independent Component Analysis

Gaussian Discriminative Analysis

Gaussian Discriminant Analysis

Evolutionary Algorithms

Concurrency of the Watchmaker framework

 

Recommendation/Collaborative Filtering

Non-distributed recommenders

Taste (UserCF, ItemCF, SlopeOne)

Distributed Recommenders

ItemCF

Vector similarity calculation

RowSimilarityJob

Calculate similarity between columns

VectorDistanceJob

Calculate the distance between vectors

Non-Map-Reduce Algorithm

Hidden Markov Models

Hidden Markov Model

Set Method Extension

Collections

Added java Collections classes.


Ii. Mahout installation and configuration
1. Download Mahouthttp: // archive.apache.org/dist/mahout/
2. Decompress tar-zxvf mahout-distribution-0.9.tar.gz
3. configure environment variable 3.1, configure the Mahout environment variable # set mahout environmentexport MAHOUT_HOME =/home/yujianxin/mahout/mahout-distribution-0.9export environment = $ MAHOUT_HOME/confexport PATH = $ MAHOUT_HOME/conf: $ MAHOUT_HOME/bin: $ PATH
3.2 configure the Hadoop environment variable required for Mahout # set hadoop environmentexport HADOOP_HOME =/home/yujianxin/hadoop/hadoop-1.1.2
Export HADOOP_CONF_DIR = $ HADOOP_HOME/conf
Export PATH = $ PATH: $ HADOOP_HOME/binexport HADOOP_HOME_WARN_SUPPRESS = not_null
4. Verify that Mahout is successfully installed: run the mahout command. If some algorithms are listed, the operation is successful.
Iii. Entry-level use of Mahout
1. Start Hadoop
2. Download synthetic_control.data from the test data http://archive.ics.uci.edu/ml/databases/synthetic_control/ Link
3. Upload test data hadoop fs-put synthetic_control.data/user/root/testdata
4. Use the kmeans clustering Algorithm in Mahout to execute the command: mahout-core org. apache. mahout. clustering. syntheticcontrol. kmeans. Job
5. view the cluster result: Run hadoop fs-ls/user/root/output.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.