Data Mining Algorithm Description

Last Update:2016-05-22 Source: Internet

Author: User

Tags two factor

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1) data input and output
WOW (): View the parameters of the Weka function.
Weka_control (): Sets the parameters of the Weka function.
Read.arff (): reads the data weka attribute-relation File format (ARFF).
Write.arff: Writes data to Weka attribute-relation file format (ARFF).

2) Data preprocessing
Normalize (): Unsupervised standardized continuity data.
Discretize (): With MDL (Minimum Description Length) method, there is supervised discretization of continuous numerical data.

3) Classification and regression
IBk (): K Nearest Neighbor Category
LBR (): Naive Bayes Method classification
J48 (): C4.5 Decision tree Algorithm (decision tree is completely independent when analyzing individual attributes).
LMT (): Combined tree structure and logistic regression model, each leaf node is a logistic regression model, the accuracy is better than the individual decision tree and logistic regression method.
m5p (): M5 model number algorithm, combining tree structure and linear regression model, each leaf node is a linear regression model, so it can be used for continuous data regression.
Decisionstump (): Single-layer decision tree algorithm, which is often used as a basic learning device for boosting.
SMO (): Support Vector Machine classification
AdaBoostM1 (): Adaboost M1 method. The-w parameter specifies the algorithm for the weak learner.
Bagging (): Create multiple models by sampling from raw data (with substitution methods).
Logitboost (): The weak learner uses the logarithmic regression method to learn the real value
Multiboostab (): The improvement of the AdaBoost method can be seen as a combination of AdaBoost and "wagging".
Stacking (): An algorithm for the integration of different basic classifiers.
Linearregression (): Establish a suitable linear regression model.
Logistic (): Establish logistic regression model.
Jrip (): A rule learning method.
M5rules (): Use M5 method to produce the decision rule of regression problem.
OneR (): A simple 1-r taxonomy.
Part (): Produces a part decision rule.

4) Clustering
Cobweb (): This is a model-based approach that assumes each clustering model and discovers data that fits the model. It is not appropriate to cluster large databases.
Farthestfirst (): Fast approximate K-mean clustering algorithm
Simplekmeans (): K-Mean clustering algorithm
Xmeans (): Improved K-mean method, can automatically determine the number of categories
DBScan (): A density-based clustering method that continuously grows clusters based on the density surrounding the object. It can find any shape clustering from a spatial database containing noise. This method defines a cluster as a set of points for a set of "Density joins."

5) Association Rules
Apriori (): Apriori is the most influential basic algorithm in the field of association rules, and is a breadth-first algorithm, which obtains frequent itemsets with support degree greater than the minimum support degree by scanning the database multiple times. Its theoretical basis is the two monotonicity principle of frequent itemsets: any subset of frequent itemsets must be frequent; the Zhing set of non-frequent itemsets must be non-frequent. In the case of massive data, the time and space cost of the Apriori algorithm is very high.
Tertius (): Tertius algorithm.
6) Prediction and evaluation:
Predict (): Predicting categories of new data based on classification or clustering results
Table (): Comparison of two factor objects
Evaluate_weka_classifier (): Evaluates the execution of the model, such as TP RATE,FP rate,precision,recall,f-measure.

Data Mining Algorithm Description

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More