# Classification Algorithm Summary

Source: Internet
Author: User
Tags svm

The comprehensive classification algorithm we have seen is a good summary.
2.4.1 Main Classification Methods many methods to solve classification problems [40-42]. A single classification method mainly includes: decision tree, Bayesian, artificial neural network, K-nearest neighbor, support vector machine, and classification based on association rules. In addition, an integrated learning algorithm is used to combine a single classification method, such as Bagging and boosting.
(1) Decision Tree
Decision tree is one of the main technologies used for classification and prediction. Decision Tree Learning is an example-based inductive learning algorithm, it aims to extract classification rules represented by decision trees from a group of unordered instances. The purpose of constructing a decision tree is to identify the relationship between attributes and categories and use it to predict the categories of records of unknown categories in the future. It uses the top-down Recursion Method to compare attributes on the internal nodes of the decision tree, and judges the branches down from the node based on different attribute values, and draws a conclusion on the leaf nodes of the decision tree.
The main decision tree algorithms include ID3, C4.5 (c5.0), cart, public, sliq, and sprint algorithms. They have their own differences in choosing the technology used for testing attributes, the structure of the generated decision tree, the method for pruning, and the time when they can process large datasets.
(2) Bayes
Bayesian classification algorithms are a type of algorithms that use probability statistics knowledge for classification, such as naive Bayes. These algorithms use Bayes Theorem to predict the likelihood of an unknown sample of each category. The class with the highest possibility is selected as the final category of the sample. The establishment of Bayesian Theorem requires a strong assumption of conditional independence, and this assumption is often not true in actual situations, so its classification accuracy will decrease. Therefore, many Bayesian classification algorithms have emerged to reduce the independence hypothesis, such as tan (tree augmented na? The ve Bayes algorithm is implemented by adding association between attribute pairs based on the Bayesian network structure.
(3) Artificial Neural Networks
Artificial Neural Networks (ANN) is a mathematical model used to process information in a structure similar to the neural network of the brain. In this model, a large number of nodes (or "neurons" or "units") are connected to each other to form a network, that is, a "Neural Network", to process information. Neural Networks usually need to be trained, and the training process is the network learning process. Training changes the connection weight of a network node so that it can be classified. The trained network can be used for object recognition.
At present, there are hundreds of different neural networks, including BP networks, radial basis function compute (RBF) networks, local neural networks, random neural networks (Boltzmann machines), and Competitive Neural Networks (Hamming networks, self-Organizing ing Network. However, neural networks still have disadvantages such as slow convergence, large computing capacity, long training time, And uninterpretability.
(4) k-nn
The K-Nearest Neighbor (KNN, K-Nearest Neighbors) algorithm is an instance-based classification method. This method is used to find the K training samples closest to the unknown sample X. To see which type of training samples most of these samples belong to, X is classified as that type. K-Nearest Neighbor (k-nn) is a lazy learning method that stores samples and performs classification only when classification is required. If the sample set is complex, it may cause a high computing overhead, therefore, it cannot be applied to scenarios with strong real-time performance.
(5) Support Vector Machine
Support Vector Machine (SVM) is a new learning method proposed by Vapnik based on statistical learning theory [43]. Its biggest characteristic is its principle of minimizing structural risks, the optimal hyperplane of classification is constructed at the maximum interval to improve the generalization ability of the learning machine. The problem of non-linearity, high dimension, and local minimization is well solved. For classification problems, the SVM algorithm calculates the decision surface of the region based on the samples in the region, and then determines the category of unknown samples in the region.
(6) classification based on Association Rules
Association rule mining is an important research area in data mining. In recent years, scholars have extensively studied how to apply association rule mining to classification. Association classification method mining is like a condset → C rule. condset is a set of items (or attribute-value pairs), and C is a class label, such rules are called class association rules (CARs ). The association classification method is generally composed of two steps: the first step is to use the association rule mining algorithm to extract all the class association rules that meet the specified support and confidence level from the training data; step 2 use a heuristic method to select a group of high-quality rules from the extracted association rules for classification. Algorithms for correlated classification include CBA [44], ADT [45], and cmar [46.
(7) ensemble learning)
The complexity of actual applications and the diversity of data often make a single classification method ineffective. Therefore, scholars have extensively studied the integration of multiple classification methods, that is, integrated learning. Integrated learning has become a hot topic in the International Machine Learning field and is called one of the four main research directions of machine learning.
Integrated Learning is a machine learning paradigm. It tries to call a single learning algorithm consecutively to obtain different basic learning devices and then combine these learning devices according to the rules to solve the same problem, it can significantly improve the generalization ability of the learning system. The combination of multiple basic learning tools mainly adopts the weighted voting method. common algorithms include bagging [47] (bagging) and boosting [48, 49] (boosting.
For more information about classifier integration, see Figure 2-5. Because the voting average method is used to combine multiple classifiers, it is possible to reduce the error of a single classifier and obtain more accurate representation of the problematic space model, thus improving the classification accuracy of the classifier.
Figure 2-5: classifier Integrated Learning
The above briefly introduces various main classification methods. It should be said that they all have their own characteristics and advantages and disadvantages. Which method should I choose for automatic identification of database loads? The criteria used to compare and evaluate classification methods [50] mainly include: (1) accuracy of prediction. The model correctly predicts the class labels of new samples. (2) calculation speed. Including the time to construct the model and use the model for classification; (3) robustness. Ability of the model to correctly predict noise data or vacant value data; (4) scalability. The ability to effectively construct models for datasets with large data volumes; (5) the simplicity and interpretability of model descriptions. The more concise and easy to understand the model description, the more popular it is.
ZZ from http://hi.baidu.com/gf271828/blog/item/38df3df172e150c10b46e06d.html

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.