Preliminary understanding of Mahout

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Apache Mahout Project consists of the following five parts:
Frequent pattern mining: mining frequently occurring itemsets in the data.
Clustering: Divides data, such as text, documents, into locally related groups.
Classification: Classification of unclassified documents by using the existing classification document training classifier.
Recommendation engine (Collaborative filtering): Get the user's behavior and discover the transactions that the user might like.
Frequent child mining: Use an item set (query record or shopping directory) to identify items that often appear together.

Machine learning algorithms implemented in Mahout:

Algorithm classes	Algorithm name	Chinese name
Classification algorithm	Logistic Regression	Logistic regression
Bayesian	Bayesian
Svm	Support Vector Machine
Perceptron	Perceptron algorithm
Neural Network	Neural network
Random forests	Random Forest
Restricted Boltzmann Machines	Finite-Boltzmann machine
Clustering algorithm	Canopy Clustering	Canopy Clustering
K-means Clustering	K-mean-value algorithm
Fuzzy K-means	Fuzzy K-Mean value
Expectation maximization	EM clustering (expected maximum clustering)
Mean Shift Clustering	Mean Drift Clustering
Hierarchical clustering	Hierarchical clustering
Dirichlet Process Clustering	Dirichlet process Clustering
Latent Dirichlet Allocation	LDA Clustering
Spectral clustering	Spectral clustering
Mining Association Rules	Parallel FP Growth algorithm	Parallel FP growth algorithm
Regression	Locally Weighted Linear Regression	Local weighted linear regression
dimensionality Reduction/Vieux-	Singular Value decomposition	Mystery value decomposition
Principal Components Analysis	Principal component Analysis
Independent Component Analysis	Independent component Analysis
Gaussian discriminative Analysis	Gaussian discriminant analysis
Evolutionary algorithms	Parallelization of the Watchmaker framework
Recommended/Collaborative filtering	Non-distributed recommenders	Taste (USERCF, ITEMCF, Slopeone)
Distributed recommenders	Itemcf
Calculation of vector similarity	Rowsimilarityjob	Calculate the similarity between columns
Vectordistancejob	Calculate distance between vectors
Non-map-reduce algorithm	Hidden Markov Models	Hidden Markov model
Collection method Extension	Collections	Extends Java's collections class

The mahout can operate in local mode, and can also take advantage of the Mr Running jobs in Hadoop.

The Mahout API is divided into the following sections:

Org.apache.mahout.cf.taste: Taste-related APIs based on collaborative filtering.
Org.apache.mahout.clustering: Clustering algorithm-related APIs
Org.apache.mahout.classifier: Classification algorithm
ORG.APACHE.MAHOUT.FPM: Frequent pattern algorithms
Org.apache.mahout.math: Mathematical computation-related algorithms
Org.apache.mahout.vectorizer: Vector computation-related algorithms

1.KMeansConfigKeys interface

Distance measurement method used by Distance_measure_key:k-means clustering algorithm
Convergence value of Cluster_convergence_key:k-means clustering algorithm
Path of Cluster_path_key:k-means Clustering algorithm

2.KCLUSTER class
is usually called by the main function, and the new cluster is calculated by a given new cluster center and distance function.
and determine whether clustering is convergent.

List of main functions of class Kcluster

Kcluster (vertor center,int clusterid,distancemeasure mesure)
: Initializes the construction method of the K-means clustering algorithm, using the input points as the center of the cluster
To create a new cluster. The parameter measure is used to compare the distance between points, center
For the new cluster Center, Clusterid is the ID of the new cluster

public static String Formatcluster (kcluster cluster)
: Formatted output

Public boolean computeconvergence (distancemeasure measure,
Double convergencedelta)
: Calculate convergence for this cluster

3.KMeansDriver class
The class is an entry function that performs clustering, including functions such as Buildclusters, Clusterdata,
Run, and Main,

Function list:
public static void Run (Org.apache.hadoop.conf.Configuration conf,
Org.apache.hadoop.fs.Path input, Org.apache.hadoop.fs.Path Clusterin,
Org.apache.hadoop.fs.Path output,distancemeasure measure,double Convergencedelta,
int maxiterations,boolean runclustering,double Clusterclassificationthreshold,
Boolean Runclustering,double Clusterclassificationthreshold,boolean runsequential)
throws IOException, Interruptedexception,classnotfoundexception the meaning of the
parameter:
conf, enter the directory path name of the point
Input, initialize the computed input point with the path name
Clustersin, Initialize and compute the path of the cluster
output, the path name of the export cluster point
measure, the class name of the distance measurement
Convergencedelta, convergence value
MaxIterations, maximum iterations
Runclustering, whether to continue clustering
Clusterclassificationthreshold after iteration is complete, points below which will not have parameter clustering
Runsequential, whether the sequential algorithm is executed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Preliminary understanding of Mahout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Preliminary understanding of Mahout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support