Go to: Main Classification Methods

Last Update:2018-12-06 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Main Classification Methods

The main classification method introduces many methods to solve the classification problem [40-42]. A single classification method mainly includes: decision tree, Bayesian, artificial neural network, K-nearest neighbor, support vector machine, and classification based on association rules. In addition, it is used to combine the integrated learning of a single classification method.Algorithm, Such as Bagging and boosting.

(1) Decision Tree

Decision tree is one of the main technologies used for classification and prediction. Decision Tree Learning is an example-based inductive learning algorithm, it aims to extract classification rules represented by decision trees from a group of unordered instances. The purpose of constructing a decision tree is to identify the relationship between attributes and categories and use it to predict the categories of records of unknown categories in the future. It uses the top-down Recursion Method to compare attributes on the internal nodes of the decision tree, and judges the branches down from the node based on different attribute values, and draws a conclusion on the leaf nodes of the decision tree.

The main decision tree algorithms include ID3, C4.5 (c5.0), cart, public, sliq, and sprint algorithms. They have their own differences in choosing the technology used for testing attributes, the structure of the generated decision tree, the method for pruning, and the time when they can process large datasets.

(2) Bayes

Bayesian classification algorithms are a type of algorithms that use probability statistics knowledge for classification, such as naive Bayes. These algorithms use Bayes Theorem to predict the likelihood of an unknown sample of each category. The class with the highest possibility is selected as the final category of the sample. The establishment of Bayesian Theorem requires a strong assumption of conditional independence, and this assumption is often not true in actual situations, so its classification accuracy will decrease. Therefore, many Bayesian classification algorithms have emerged to reduce the independence hypothesis, such as the tree augmented Na has ve Bayes algorithm, it is achieved by adding association between attribute pairs based on the Bayesian network structure.

(3) Artificial Neural Networks

Artificial Neural Networks (ANN) is a mathematical model used to process information in a structure similar to the neural network of the brain. In this model, a large number of nodes (or "neurons" or "units") are connected to each other to form a network, that is, a "Neural Network", to process information. Neural Networks usually need to be trained, and the training process is the network learning process. Training changes the connection weight of a network node so that it can be classified. The trained network can be used for object recognition.

At present, there are hundreds of different neural networks, including BP networks, radial basis function compute (RBF) networks, local neural networks, random neural networks (Boltzmann machines), and Competitive Neural Networks (Hamming networks, self-Organizing ing Network. However, neural networks still have disadvantages such as slow convergence, large computing capacity, long training time, And uninterpretability.

(4) k-nn

The K-Nearest Neighbor (KNN, K-Nearest Neighbors) algorithm is an instance-based classification method. This method is used to find the K training samples closest to the unknown sample X. To see which type of training samples most of these samples belong to, X is classified as that type. K-Nearest Neighbor (k-nn) is a lazy learning method that stores samples and performs classification only when classification is required. If the sample set is complex, it may cause a high computing overhead, therefore, it cannot be applied to scenarios with strong real-time performance.

(5) Support Vector Machine

Support Vector Machine (SVM) is a new learning method proposed by Vapnik based on statistical learning theory [43]. Its biggest characteristic is its principle of minimizing structural risks, the optimal hyperplane of classification is constructed at the maximum interval to improve the generalization ability of the learning machine. The problem of non-linearity, high dimension, and local minimization is well solved. For classification problems, the SVM algorithm calculates the decision surface of the region based on the samples in the region, and then determines the category of unknown samples in the region.

(6) classification based on Association Rules

Association rule mining is an important research area in data mining. In recent years, scholars have extensively studied how to apply association rule mining to classification. Association classification method mining is like a condset → C rule. condset is a set of items (or attribute-value pairs), and C is a class label, such rules are called class association rules (CARs ). The association classification method is generally composed of two steps: the first step is to use the association rule mining algorithm to extract all the class association rules that meet the specified support and confidence level from the training data; step 2 use a heuristic method to select a group of high-quality rules from the extracted association rules for classification. Algorithms for correlated classification include CBA [44], ADT [45], and cmar [46.

(7) ensemble learning)

The complexity of actual applications and the diversity of data often make a single classification method ineffective. Therefore, scholars have extensively studied the integration of multiple classification methods, that is, integrated learning. Integrated learning has become a hot topic in the International Machine Learning field and is called one of the four main research directions of machine learning.

Integrated Learning is a machine learning paradigm. It tries to call a single learning algorithm consecutively to obtain different basic learning devices and then combine these learning devices according to the rules to solve the same problem, it can significantly improve the generalization ability of the learning system. The combination of multiple basic learning tools mainly adopts the weighted voting method. common algorithms include bagging [47] (bagging) and boosting [48, 49] (boosting.

For more information about classifier integration, see Figure 2-5. Because the voting average method is used to combine multiple classifiers, it is possible to reduce the error of a single classifier and obtain more accurate representation of the problematic space model, thus improving the classification accuracy of the classifier.

Figure 2-5: classifier Integrated Learning

The above briefly introduces various main classification methods. It should be said that they all have their own characteristics and advantages and disadvantages. Which method should I choose for automatic identification of database loads? The criteria used to compare and evaluate classification methods [50] mainly include: (1) accuracy of prediction. The model correctly predicts the class labels of new samples. (2) calculation speed. Including the time to construct the model and use the model for classification; (3) robustness. Ability of the model to correctly predict noise data or vacant value data; (4) scalability. The ability to effectively construct models for datasets with large data volumes; (5) the simplicity and interpretability of model descriptions. The more concise and easy to understand the model description, the more popular it is.

Original article: http://hi.baidu.com/gf271828/item/1d5640d692ceeac71a72b470

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More