Hadoop (13), hadoop

Last Update:2015-01-15 Source: Internet

Author: User

Tags svm hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop (13), hadoop

1. mahout introduction:

Mahout is a powerful data mining tool and a collection of distributed machine learning algorithms, including the implementation, classification, and clustering of distributed collaborative filtering called Taste. The biggest advantage of Mahout is its hadoop-based implementation, which converts many previous algorithms running on a single machine into the MapReduce mode, which greatly improves the data size and processing performance that the algorithm can process.

The machine learning algorithm implemented in mahout is as follows:

Algorithm	Algorithm name	Chinese name
Classification Algorithm	Logistic Regression	Logistic Regression
	Bayesian	Bayes
	SVM	SVM
	Perceptron	Sensor Algorithm
	Neural Network	Neural Network
	Random Forests	Random Forest
	Restricted Boltzmann Machines	Limited Polman Machine
Clustering Algorithm	Canopy Clustering	Canopy Clustering
	K-means Clustering	K-means algorithm
	Fuzzy K-means	Fuzzy K-means
	Expectation Maximization	EM clustering (expectation maximization clustering)
	Mean Shift Clustering	Mean Shift Clustering
	Hierarchical Clustering	Hierarchical Clustering
	Dirichlet Process Clustering	Dirichlet process Clustering
	Latent Dirichlet Allocation	LDA Clustering
	Spectral Clustering	Spectral clustering
Association Rule Mining	Parallel FP Growth Algorithm	Parallel FP Growth algorithm
Regression	Locally Weighted Linear Regression	Local Weighted Linear Regression
Dimension Reduction/Dimension Reduction	Singular Value Decomposition	Singular Value Decomposition
	Principal Components Analysis	Principal Component Analysis
	Independent Component Analysis	Independent Component Analysis
	Gaussian Discriminative Analysis	Gaussian Discriminant Analysis
Evolutionary Algorithms	Concurrency of the Watchmaker framework
Recommendation/Collaborative Filtering	Non-distributed recommenders	Taste (UserCF, ItemCF, SlopeOne)
Recommendation/Collaborative Filtering	Distributed Recommenders	ItemCF
Vector similarity calculation	RowSimilarityJob	Calculate similarity between columns
Vector similarity calculation	VectorDistanceJob	Calculate the distance between vectors
Non-Map-Reduce Algorithm	Hidden Markov Models	Hidden Markov Model
Set Method Extension	Collections	Added java Collections classes.

Ii. Mahout installation and configuration
1. Download Mahouthttp: // archive.apache.org/dist/mahout/
2. Decompress tar-zxvf mahout-distribution-0.9.tar.gz
3. configure environment variable 3.1, configure the Mahout environment variable # set mahout environmentexport MAHOUT_HOME =/home/yujianxin/mahout/mahout-distribution-0.9export environment = $ MAHOUT_HOME/confexport PATH = $ MAHOUT_HOME/conf: $ MAHOUT_HOME/bin: $ PATH
3.2 configure the Hadoop environment variable required for Mahout # set hadoop environmentexport HADOOP_HOME =/home/yujianxin/hadoop/hadoop-1.1.2
Export HADOOP_CONF_DIR = $ HADOOP_HOME/conf
Export PATH = $ PATH: $ HADOOP_HOME/binexport HADOOP_HOME_WARN_SUPPRESS = not_null
4. Verify that Mahout is successfully installed: run the mahout command. If some algorithms are listed, the operation is successful.
Iii. Entry-level use of Mahout
1. Start Hadoop
2. Download synthetic_control.data from the test data http://archive.ics.uci.edu/ml/databases/synthetic_control/ Link
3. Upload test data hadoop fs-put synthetic_control.data/user/root/testdata
4. Use the kmeans clustering Algorithm in Mahout to execute the command: mahout-core org. apache. mahout. clustering. syntheticcontrol. kmeans. Job
5. view the cluster result: Run hadoop fs-ls/user/root/output.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More