Data Mining Algorithm Description

Source: Internet
Author: User
Tags two factor

1) data input and output
WOW (): View the parameters of the Weka function.
Weka_control (): Sets the parameters of the Weka function.
Read.arff (): reads the data weka attribute-relation File format (ARFF).
Write.arff: Writes data to Weka attribute-relation file format (ARFF).


2) Data preprocessing
Normalize (): Unsupervised standardized continuity data.
Discretize (): With MDL (Minimum Description Length) method, there is supervised discretization of continuous numerical data.


3) Classification and regression
IBk (): K Nearest Neighbor Category
LBR (): Naive Bayes Method classification
J48 (): C4.5 Decision tree Algorithm (decision tree is completely independent when analyzing individual attributes).
LMT (): Combined tree structure and logistic regression model, each leaf node is a logistic regression model, the accuracy is better than the individual decision tree and logistic regression method.
m5p (): M5 model number algorithm, combining tree structure and linear regression model, each leaf node is a linear regression model, so it can be used for continuous data regression.
Decisionstump (): Single-layer decision tree algorithm, which is often used as a basic learning device for boosting.
SMO (): Support Vector Machine classification
AdaBoostM1 (): Adaboost M1 method. The-w parameter specifies the algorithm for the weak learner.
Bagging (): Create multiple models by sampling from raw data (with substitution methods).
Logitboost (): The weak learner uses the logarithmic regression method to learn the real value
Multiboostab (): The improvement of the AdaBoost method can be seen as a combination of AdaBoost and "wagging".
Stacking (): An algorithm for the integration of different basic classifiers.
Linearregression (): Establish a suitable linear regression model.
Logistic (): Establish logistic regression model.
Jrip (): A rule learning method.
M5rules (): Use M5 method to produce the decision rule of regression problem.
OneR (): A simple 1-r taxonomy.
Part (): Produces a part decision rule.


4) Clustering
Cobweb (): This is a model-based approach that assumes each clustering model and discovers data that fits the model. It is not appropriate to cluster large databases.
Farthestfirst (): Fast approximate K-mean clustering algorithm
Simplekmeans (): K-Mean clustering algorithm
Xmeans (): Improved K-mean method, can automatically determine the number of categories
DBScan (): A density-based clustering method that continuously grows clusters based on the density surrounding the object. It can find any shape clustering from a spatial database containing noise. This method defines a cluster as a set of points for a set of "Density joins."


5) Association Rules
Apriori (): Apriori is the most influential basic algorithm in the field of association rules, and is a breadth-first algorithm, which obtains frequent itemsets with support degree greater than the minimum support degree by scanning the database multiple times. Its theoretical basis is the two monotonicity principle of frequent itemsets: any subset of frequent itemsets must be frequent; the Zhing set of non-frequent itemsets must be non-frequent. In the case of massive data, the time and space cost of the Apriori algorithm is very high.
Tertius (): Tertius algorithm.
6) Prediction and evaluation:
Predict (): Predicting categories of new data based on classification or clustering results
Table (): Comparison of two factor objects
Evaluate_weka_classifier (): Evaluates the execution of the model, such as TP RATE,FP rate,precision,recall,f-measure.

Data Mining Algorithm Description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.