Summary of Algorithm pros and cons (AAA recommended)

Source: Internet
Author: User
Tags svm

Sklearn combat-Breast cancer cell data mining

Https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source= Cp-400000000398149&utm_medium=share

https://www.sohu.com/a/128627325_464088

"Bayesian is characterized by the combination of phenomena and laws, but it ignores the fusion of (subjective) values ... I hope to see the depth Bayesian method appear earlier! 】

The main advantages of naive Bayes are:

1) Naive Bayesian model originates from classical mathematics theory and has stable classification efficiency.

2) Small-scale data performance is very good, can handle multi-classification tasks, suitable for incremental training, especially when the amount of data out of memory, we can batch of training to increase.

3) The missing data is not very sensitive, the algorithm is relatively simple, often used for text classification.

The main drawbacks of naive Bayes are:

1) Theoretically, the naive Bayesian model has the smallest error rate compared with other classification methods. But this is not always the case, this is because the naïve Bayesian model assumes that the attributes are independent of each other, this hypothesis is often not established in the practical application, when the number of attributes is more or the correlation between the attributes is large, the classification effect is not good. However, the performance of naive Bayesian is the best when the property correlation is small. For this, there are semi-naïve Bayesian algorithms that are moderately improved by considering some of the correlations.

2) a priori probability needs to be known, and a priori probability depends many times on the hypothesis that the hypothetical model can be of many kinds, so that at some point the predictive effect is poor due to the hypothetical priori model.

3) Since we determine the probability of the posterior examination by transcendental and data, the classification decision has a certain error rate.

4) is sensitive to the form of input data expression.

The above is a summary of the naive Bayesian algorithm, hoping to help friends.

Articles from:

Http://www.cnblogs.com/pinard/p/6069267.html

Also attached:

1 decision Tree (decision Trees) pros and cons

The advantages of the decision tree:

The decision tree is easy to understand and explain. People have the ability to understand the meaning of decision trees after they have been explained.

Second, for decision trees, the preparation of data is often simple or unnecessary. Other techniques often require that data be generalized first, such as removing extraneous or blank attributes.

Third, the ability to handle both data and regular properties. Other technologies often require a single data attribute.

Iv. decision Tree is a white box model. Given an observed model, it is easy to roll out the corresponding logical expression based on the resulting decision tree.

It is easy to evaluate the model by static testing. Indicates that it is possible to measure the credibility of the model.

Six, in a relatively short period of time can be a large data source to make feasible and effective results.

The decision tree can be constructed for datasets with many attributes.

Eight, the decision tree can be well extended to large databases, and its size is independent of the size of the database.

The disadvantages of the decision tree:

In the decision tree, the results of information gain are biased towards those features with more numerical values.

Second, the decision tree processing the missing data difficulties.

Thirdly, the problem of over-fitting is appearing.

Iv. ignore correlations between attributes in the dataset.

The advantages of Artificial neural network: High accuracy of classification, strong parallel distributed processing ability, strong distributed storage and learning ability, strong robustness and fault tolerance for noisy nerves, full approximation to complex nonlinear relationship, and function of associative memory.

The disadvantage of Artificial neural network: Neural network needs a large number of parameters, such as network topology structure, weight and threshold value of the initial value, can not observe the learning process between, the output is difficult to explain, the results will affect the credibility and acceptability; too much study time, may not even achieve the purpose of learning.

First, there is no concern with the problem area fast random search ability.

Second, the search from the group, with potential parallelism, can be multiple individuals at the same time comparison, good robustness.

Third, the search uses the evaluation function to inspire, the process is simple.

Iv. the use of probabilistic mechanisms for iterative, with randomness.

Five, scalability, easy to combine with other algorithms.

First, the implementation of genetic algorithm is more complex programming, it is necessary to encode the problem, to find the optimal solution also need to decode the problem,

Second, the implementation of the other three operators also have many parameters, such as crossover rate and mutation rate, and the choice of these parameters seriously affect the quality of the solution, and the choice of these parameters is mostly based on experience. No timely use of the network feedback information, the algorithm search speed is relatively slow, to get more accurate solution needs more training time.

Thirdly, the algorithm has certain dependence on the initial population selection, and can be improved by combining some heuristic algorithms.

One, simple, effective.

Second, the cost of retraining is low (changes in category systems and training sets are common in web environments and e-commerce applications).

Computing time and space are linear to the size of the training set (not too large in some cases).

The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class, so for the sample set that crosses or overlaps a lot of classes.

The algorithm is suitable for automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.

One, KNN algorithm is lazy learning method (lazy learning, basically do not learn), some active learning algorithm to much faster.

Second, the category score is not normalized (unlike probability scoring).

Third, the output can not be interpreted strongly, for example, the decision tree is more explanatory.

Four, the algorithm in the classification of a major disadvantage is that when the sample is unbalanced, such as a class of sample capacity is very large, and other classes of sample capacity is very small, it is possible that when a new sample is entered, the sample of the K neighbors of the bulk class sample accounted for the majority. The algorithm calculates only the "nearest" neighbor sample, the number of samples is large, or the sample is not close to the target sample, or the sample is close to the target sample. In any case, the quantity does not affect the running result. It can be improved by using the weighted method (which is larger than the neighbor with the small distance of the sample).

Five, the calculation of a large amount. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification.

One, can solve the problem of machine learning in the case of small samples.

Second, can improve the generalization performance.

Thirdly, we can solve the problem of high dimension.

Four, can solve the nonlinear problem.

Five, can avoid the neural network structure choice and the local minimum point problem.

First, sensitive to missing data.

Second, there is no universal solution to the nonlinear problem, we must choose kernelfunction carefully to deal with it.

The naïve Bayesian model originates from classical mathematics theory, has a solid mathematical foundation and stable classification efficiency.

Second, the NBC model needs to estimate a few parameters, the missing data is not too sensitive, the algorithm is relatively simple.

First, theoretically, the NBC model has the smallest error rate compared with other classification methods. However, this is not always the case, because the NBC model assumes that the properties are independent of each other, which is often not true in practical applications (clustering is considered to be the first clustering of attributes with greater correlation), which has a certain effect on the correct classification of the NBC model. When the number of attributes is more or the correlation between attributes is large, the efficiency of the NBC model is inferior to the decision tree model. The performance of the NBC model is best when the attribute correlation is small.

Second, need to know a priori probability.

Iii. the error rate of classification decision

First, AdaBoost is a kind of classifier with high accuracy.

Second, a variety of methods can be used to construct the sub-classifier, the AdaBoost algorithm provides a framework.

Third, when the simple classifier is used, the calculated result is understandable. And the weak classifier construction is extremely simple.

Four, simple, do not have to do feature screening.

Five, do not worry about overfitting.

The outstanding advantage of the Rocchio algorithm is that it is easy to implement, the calculation (training and classification) is very simple, it is usually used to measure the performance of the classification system of the benchmark system, and the practical classification system rarely use this algorithm to solve the specific classification problem.

9 Comparison of various classification algorithms

According to the conclusions of this paper,

Calibrated boosted trees has the best performance, random forest second, uncalibrated bagged trees third, Calibratedsvms IV, uncalibrated neural nets fifth.

Poor performance is naive Bayesian, decision tree.

Some algorithms perform well under a particular data set.

[1] Rosenlin, MA June, Pan Limin. Data mining theory and technology [M]. Electronic industry press. 2013.126-126

[2] Yang Xiaofan, Chen Tingyu. Advantages and disadvantages inherent in artificial neural networks [J]. Computer Science. 1994 (vol.21). 23-26

[3] The pros and cons of Steve's genetic algorithm.

[4] Yang Jianwu. Text Automatic classification technology.

Www.icst.pku.edu.cn/course/mining/12-13spring/TextMining04-%E5%88%86%E7%B1%BB.pdf

[5] Baiyun Ball Studio. SVM (Support vector machine) Overview.

[6] Zhang summer. Statistical learning theory and the lack of SVM (1).

[7] Richcaruana,alexandruniculescu-mizil.an empirical Comparison of supervised learningalgorithms.2006

Python wind rating card modeling and wind control knowledge

Https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source= Cp-400000000398149&utm_medium=share

Summary of Algorithm pros and cons (AAA recommended)

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• Sales Support

1 on 1 presale consultation

• After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.