Summary of 18 Classic data mining algorithms

Last Update:2015-06-15 Source: Internet

Author: User

Tags id3 svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary of 18 Classic data mining algorithms

All the data mining code involved in this article has been put on my github.

Address Link: https://github.com/linyiqun/DataMiningAlgorithm

It took about 2 months to learn the classic algorithm of 18 big data mining and implement the code, which involves decision classification, clustering, link mining, mining, pattern mining and so on. is also a small introduction to the field of data mining. Here is a small summary, the following are my own corresponding algorithm blog post link, I hope to help you learn.

1.c4.5 algorithm. C4.5 algorithm, like ID3 algorithm, is a mathematical classification algorithm, C4.5 algorithm is an improvement of ID3 algorithm. The ID3 algorithm uses information gain for decision-making, while C4.5 uses the gain rate.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42395865

2.CART algorithm. The full name of the cart algorithm is the categorical regression tree algorithm, he is a two-dollar classification, using a similar to the entropy of the Gini index as a classification decision-making, after the formation of a decision tree after pruning, I myself in the implementation of the entire algorithm is the cost of complexity algorithm,

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42558235

3.KNN (k nearest neighbor) algorithm. Given some already trained data, enter a new test data point, calculate the classification of the most recent points contained in this test data point, which category is the majority, then the classification of this test point is the same, so here, sometimes you can copy different classification points of different weights. Near the point of the weight of the big point, far from the point of natural small point.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42613011

4.Naive Bayes (naive Bayesian) algorithm. Naive Bayesian algorithm is a relatively simple classification algorithm in Bayesian algorithm, which uses a relatively important Bayesian theorem, and a simple word generalization is the derivation of the conditional probabilities of mutual transformation.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42680161

5.SVM (Support vector machine) algorithm. Support Vector Machine (SVM) algorithm is a method for classifying linear and nonlinear data, which can be processed by the kernel function when the nonlinear data is classified. One of the key steps is to search for the maximum edge hyper-plane.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42780439

6.EM (desired maximization) algorithm. The desired maximization algorithm can be split into 2 algorithms, 1 e-step steps, and one m-step maximization step. He is an algorithm framework that approximates the maximum likelihood or maximum posteriori estimate of the statistical model parameters after each calculation.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/42921789

7.Apriori algorithm. Apriori algorithm is an association rule mining algorithm, mining frequent itemsets by linking and pruning operations, and then getting association rules based on frequent itemsets, and the export of association rules needs to satisfy the requirement of minimum confidence level.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43059211

8.fp-tree (frequent pattern tree) algorithm. This algorithm also known as the fp-growth algorithm, the algorithm overcomes the shortcomings of the Apriori algorithm to produce too many collections, through the recursive generation of frequency pattern tree, and then to the tree mining, the subsequent process and Apriori algorithm consistent.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43234309

9.PageRank (page importance/rank) algorithm. PageRank algorithm originated in Google, the core idea is through the Web page into the chain as a good quick decision criteria, if the 1 pages inside contains multiple points to external links, then the PR value will be evenly divided, the PageRank algorithm will be hit by link span.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43311943

10.HITS algorithm. Hits algorithm is another link algorithm, part of the principle and PageRank algorithm is similar, hits algorithm introduces the concept of authoritative value and center value, hits algorithm is affected by user query conditions, he is generally used for small-scale data link analysis, but also more vulnerable to attack.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43311943

11.k-means (K-mean) algorithm. The K-means algorithm is a clustering algorithm, K in here refers to the classification of the number of types, so at the beginning of the setting is very critical, the principle of the algorithm is to first assume K classification points, and then according to the Euclidean distance calculation classification, and then to the same classification of the mean value as a new cluster center, loop operation until convergence.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43373159

12.BIRCH algorithm. The birch algorithm uses the construction of the CF Clustering feature tree as the core of the algorithm, through the tree form, the birch algorithm scans the database, constructs an initial cf-tree in the memory, can be regarded as the data multi-layer compression.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43532111

13.AdaBoost algorithm. The adaboost algorithm is a lifting algorithm that obtains multiple complementary classifiers through multiple training of data, and then combines multiple classifiers to form a more accurate classifier.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43635115

14.GSP algorithm. The GSP algorithm is a sequential pattern mining algorithm. The GSP algorithm is also a Apriori class algorithm, in the process of the algorithm will also be connected and pruning operations, but in the pruning judgment also added some time constraints and other conditions.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43699083

15.PreFixSpan algorithm. Prefixspan algorithm is another sequential pattern mining algorithm, in the process of the algorithm will not produce candidate sets, given the initial prefix pattern, constantly through the suffix pattern of the elements to go to the prefix pattern, and continuous recursive mining down.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43766253

16.CBA (based on Association rule classification) algorithm. CBA algorithm is an integrated mining algorithm, because he is based on association rules Mining algorithm, in the context of existing association rules, to do the classification and judgment, only at the beginning of the algorithm to do the data processing, become similar to the form of a transaction.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43818787

17.RoughSets (rough set) algorithm. Rough set theory is a relatively new idea of data mining. In this paper, a rough set is used to attribute reduction algorithm, by the upper and lower approximation set to delete invalid properties, to regulate the output.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43876001

18.gSpan algorithm. Gspan algorithm belongs to the field of graph mining algorithm. , mainly used for mining frequent sub-graphs, compared with other graph algorithms, the sub-graph mining algorithm is their premise or basic algorithm. The Gspan algorithm uses DFS coding, and the edge five tuple, the most right path sub-graph extension concept, algorithm comparison of the abstract and complex.

Details Link:http://blog.csdn.net/androidlushangderen/article/details/43924273

Summary of 18 Classic data mining algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More