I. Introduction of Mahout
Check the Chinese meaning of mahout--the people, and then look at Mahout logo, well, want to play with the small yellow elephant happy, have to accompany the person to accompany the man to play a trick ...
Attached Logo:
(That's him, the mahout on the Elephant's head)
Step into the text:
Mahout is a powerful data mining tool that is a collection of distributed machine learning algorithms, including: implementation, classification, clustering of distributed collaborative filtering called taste. Mahout The biggest advantage is based on Hadoop implementation, a lot of previously run on a single-machine algorithm, converted to MapReduce mode, which greatly improved the algorithm can handle the amount of data and processing performance.
machine learning algorithms implemented in Mahout:
Algorithm classes |
Algorithm name |
Chinese name |
Classification algorithm |
Logistic Regression |
Logistic regression |
Bayesian |
Bayesian |
Svm |
Support Vector Machine |
Perceptron |
Perceptron algorithm |
Neural Network |
Neural network |
Random forests |
Random Forest |
Restricted Boltzmann Machines |
Finite-Boltzmann machine |
Clustering algorithm |
Canopy Clustering |
Canopy Clustering |
K-means Clustering |
K-mean-value algorithm |
Fuzzy K-means |
Fuzzy K-Mean value |
Expectation maximization |
EM clustering (expected maximum clustering) |
Mean Shift Clustering |
Mean Drift Clustering |
Hierarchical clustering |
Hierarchical clustering |
Dirichlet Process Clustering |
Dirichlet process Clustering |
Latent Dirichlet Allocation |
LDA Clustering |
Spectral clustering |
Spectral clustering |
Mining Association Rules |
Parallel FP Growth algorithm |
Parallel FP growth algorithm |
Regression |
Locally Weighted Linear Regression |
Local weighted linear regression |
dimensionality Reduction/Vieux- |
Singular Value decomposition |
Singular value decomposition |
Principal Components Analysis |
Principal component Analysis |
Independent Component Analysis |
Independent component Analysis |
Gaussian discriminative Analysis |
Gaussian discriminant analysis |
Evolutionary algorithms |
Parallelization of the Watchmaker framework |
|
Recommended/Collaborative filtering |
Non-distributed recommenders |
Taste (USERCF, ITEMCF, Slopeone) |
Distributed recommenders |
Itemcf |
Calculation of vector similarity |
Rowsimilarityjob |
Calculate the similarity between columns |
Vectordistancejob |
Calculate distance between vectors |
Non-map-reduce algorithm |
Hidden Markov Models |
Hidden Markov model |
Collection method Extension |
Collections |
Extends Java's collections class |
Second, mahout installation, configuration
First, download Mahouthttp://archive.apache.org/dist/mahout/
Second, decompressionTAR-ZXVF mahout-distribution-0.9.tar.gz
Third, configure environment variables3.1. Configure MAHOUT environment variable # set Mahout environmentexport mahout_home=/home/yujianxin/mahout/mahout-distribution-0.9export mahout_conf_dir= $MAHOUT _home/confexport path= $MAHOUT _home/conf: $MAHOUT _home/bin: $PATH 3.2, configure the required Hadoop environment variables for MAHOUT # Set Hadoop environmentexport hadoop_home=/home/yujianxin/hadoop/hadoop-1.1.2
Export hadoop_conf_dir= $HADOOP _home/conf
Export path= $PATH: $HADOOP _home/binexport hadoop_home_warn_suppress=not_null
Iv. Verifying that the mahout is installed successfullyExecutes the command mahout. If you list some algorithms, you are successful,
v. Use of mahout for entry-level use5.1, start Hadoop5.2, download test data http://archive.ics.uci.edu/ml/databases/synthetic_control/link in the Synthetic_control.data 5.3, Uploading test data Hadoop fs-put synthetic_control.data/user/root/testdata5.4 uses the Kmeans clustering algorithm in Mahout to execute commands: Mahout-core Org.apache.mahout.clustering.syntheticcontrol.kmeans.Job takes about 9 minutes to complete clustering. 5.5 View clustering results perform Hadoop fs-ls/user/root/output to view clustering results. To live, to work. Mahout continue to learn ...
Mahout Learning mahout Introduction, installation, configuration, entry program testing