Comparison of various classification algorithms

Source: Internet
Author: User
Tags svm


1 decision Tree (decision Trees) pros and cons

The advantages of the decision tree:

The decision tree is easy to understand and explain. People have the ability to understand the meaning of decision trees after they have been explained.

Second, for decision trees, the preparation of data is often simple or unnecessary. Other techniques often require that data be generalized first, such as removing extraneous or blank attributes.

Third, the ability to handle both data and regular properties. Other technologies often require a single data attribute.

Iv. decision Tree is a white box model. Given an observed model, it is easy to roll out the corresponding logical expression based on the resulting decision tree.

It is easy to evaluate the model by static testing. Indicates that it is possible to measure the credibility of the model.

Six, be able to make feasible and effective results for large data sources in a relatively short period of time.

The decision tree can be constructed for datasets with many attributes.

Eight, the decision tree can be well extended to large databases, and its size is independent of the size of the database.

The disadvantages of the decision tree:

In the decision tree, the results of information gain are biased towards those features with more numerical values.

Second, the decision tree processing the missing data difficulties.

Thirdly, the problem of over-fitting is appearing.

Iv. ignore correlations between attributes in the dataset.



2 advantages and disadvantages of artificial neural networks

The advantages of Artificial neural network: High accuracy of classification, strong parallel distributed processing ability, strong distributed storage and learning ability, strong robustness and fault tolerance for noisy nerves, full approximation to complex nonlinear relationship, and function of associative memory.

The disadvantage of Artificial neural network: Neural network needs a large number of parameters, such as network topology structure, weight and threshold value of the initial value, can not observe the learning process between, the output is difficult to explain, the results will affect the credibility and acceptability; too much study time, may not even achieve the purpose of learning.




3 Advantages and disadvantages of genetic algorithms

Advantages of genetic algorithms:

First, there is no concern with the problem area fast random search ability.

Second, the search from the group, with potential parallelism, can be multiple individuals at the same time comparison, good robustness.

Third, the search uses the evaluation function to inspire, the process is simple.

Iv. the use of probabilistic mechanisms for iterative, with randomness.

Five, scalability, easy to combine with other algorithms.

Disadvantages of genetic algorithms:

First, the implementation of genetic algorithm is more complex programming, it is necessary to encode the problem, to find the optimal solution also need to decode the problem,

Second, the implementation of the other three operators also have many parameters, such as crossover rate and mutation rate, and the choice of these parameters seriously affect the quality of the solution, and the choice of these parameters is mostly based on experience. No timely use of the network feedback information, the algorithm search speed is relatively slow, to get more accurate solution needs more training time.

Thirdly, the algorithm has certain dependence on the initial population selection, and can be improved by combining some heuristic algorithms.



Advantages and disadvantages of the 4 KNN algorithm (k-nearest neighbour)

The advantages of KNN algorithm:

One, simple, effective.

Second, the cost of retraining is low (changes in category systems and training sets are common in web environments and e-commerce applications).

computing time and space are linear to the size of the training set (not too large in some cases).

The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class, so for the sample set that crosses or overlaps a lot of classes.

The algorithm is suitable for automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.

KNN Algorithm Disadvantages:

One, KNN algorithm is lazy learning method (lazy learning, basically do not learn), some active learning algorithm to much faster.

Second, the category score is not normalized (unlike probability scoring).

Third, the output can not be interpreted strongly, for example, the decision tree is more explanatory.

Four, the algorithm in the classification of a major disadvantage is that when the sample is unbalanced, such as a class of sample capacity is very large, and other classes of sample capacity is very small, it is possible that when a new sample is entered, the sample of the K neighbors of the bulk class sample accounted for the majority. The algorithm calculates only the "nearest" neighbor sample, the number of samples is large, or the sample is not close to the target sample, or the sample is close to the target sample. In any case, the quantity does not affect the running result. It can be improved by using the weighted method (which is larger than the neighbor with the small distance of the sample).

Five, the calculation of a large amount. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification.




5 advantages and disadvantages of support vector machine (SVM)

Advantages of SVM:

One, can solve the problem of machine learning in the case of small samples.

Second, can improve the generalization performance.

Thirdly, we can solve the problem of high dimension.

Four, can solve the nonlinear problem.

Five, can avoid the neural network structure choice and the local minimum point problem.

Disadvantages of SVM:

First, sensitive to missing data.

Second, there is no universal solution to the nonlinear problem, we must choose kernelfunction carefully to deal with it.



6 advantages and disadvantages of naive Bayes

Advantages:

The naïve Bayesian model originates from classical mathematics theory, has a solid mathematical foundation and stable classification efficiency.

Second, the NBC model needs to estimate a few parameters, the missing data is not too sensitive, the algorithm is relatively simple.

Disadvantages:

First, theoretically, the NBC model has the smallest error rate compared with other classification methods. However, this is not always the case, because the NBC model assumes that the properties are independent of each other, which is often not true in practical applications (clustering is considered to be the first clustering of attributes with greater correlation), which has a certain effect on the correct classification of the NBC model. When the number of attributes is more or the correlation between attributes is large, the efficiency of the NBC model is inferior to the decision tree model. The performance of the NBC model is best when the attribute correlation is small.

Second, need to know a priori probability.

Iii. the error rate of classification decision



7 Advantages of the Adaboosting method

First, AdaBoost is a kind of classifier with high accuracy.

Second, a variety of methods can be used to construct the sub-classifier, the AdaBoost algorithm provides a framework.

Third, when the simple classifier is used, the calculated result is understandable. And the weak classifier construction is extremely simple.

Four, simple, do not have to do feature screening.

Five, do not worry about overfitting.



8 Advantages of Rocchio

The outstanding advantage of the Rocchio algorithm is that it is easy to implement, the calculation (training and classification) is very simple, it is usually used to measure the performance of the classification system of the benchmark system, and the practical classification system rarely use this algorithm to solve the specific classification problem.




9 Comparison of various classification algorithms

According to the conclusions of this paper,

Calibrated boosted trees has the best performance, random forest second, uncalibrated bagged trees third, Calibratedsvms IV, uncalibrated neural nets fifth.

Poor performance is naive Bayesian, decision tree.

Some algorithms perform well under a particular data set.

Comparison of various classification algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.