Machine Learning (11)-Common machine learning algorithms advantages and disadvantages comparison, applicable conditions

Source: Internet
Author: User
Tags svm

1. Decision Tree

  applicable conditions: The data of different class boundary is non-linear, and by continuously dividing the feature space into a matrix to simulate. There is a certain correlation between features. The number of feature values should be similar, because the information gain is biased towards more numerical characteristics.

  Advantages: 1. Intuitive decision-making rules; 2. Nonlinear characteristics can be handled; 3. The interaction between variables is considered.

  Disadvantages: 1. Easy overfitting (random forest, pruning); 2. Difficulty in handling missing data; 3. Correlation between attributes in a dataset.

2. SVM

  applicable conditions: Large feature space, can deal with non-linear characteristics.

  Advantages:1, can deal with small samples of machine learning problems, 2, can deal with high-dimensional features, 3, the use of nuclear functions to deal with non-linear feature space, to solve nonlinear problems; 4, the classification surface does not depend on all data, only with a few support vectors

  Disadvantages: 1. For a large number of forecast samples, the efficiency will be very low; 2. Appropriate kernel functions need to be found. 3. Sensitive to missing data

3. LR

  applicable conditions: data linear distribution;

  Advantages: 1. The model is simple, the training speed is fast; 2. Logistic regression is widely used and industrial problems.

  Disadvantage:1, the form is simple, but the accuracy rate is not very high; 2. It is difficult to deal with data imbalance by relying on all the data; 3. Dealing with nonlinear data is more troublesome. Logistic regression can only deal with linearly-divided data without introducing other methods, or further, the problem of dealing with two classifications; 4. The logistic regression itself cannot filter the features. Sometimes we use GBDT to filter features and then return to logic.

4, three contrast:

Model complexity: SVM supports kernel functions and can deal with linear nonlinear problems; The LR model is simple, the training speed is fast, and it is suitable for dealing with linear problems; decision trees are easy to fit and need pruning.

Loss function: SVM hinge loss; LR L2 regularization; DT AdaBoost Index Loss

Data sensitivity: SVM addition tolerance is insensitive to outlier, only concerns support vectors, and needs to be normalized first; LR is sensitive to distance.

Data Volume: Large data with LR, small data size and less features using SVM nonlinear kernel

5. Neural network

  applicable conditions: The amount of data is large, there is an intrinsic link between the parameters.

  Advantages: 1. The parallel distribution processing ability is strong; 2. Extracting data characteristics; 3. Approximation of complex nonlinear relationships.

  Disadvantages: 1. Requires a large number of parameters; 2. The study time is too long; 3. The learning process cannot be observed, and the output is difficult to interpret.

6. Random Forest

Advantages: 1, training can be highly parallelized , for the big data age of large sample training speed has an advantage, 2, can handle very high dimensions (feature many) data, and do not have to do feature selection, 3, can be used for feature selection, give the importance of each feature, reduce the feature space dimension; 4 , because of the random sampling, the variance of the trained model is small and the generalization ability is strong; 5. Simple implementation, insensitive to partial missing data (due to random selection of samples, random selection characteristics)

Disadvantage: 1, in some noisy sample set, the RF model is easy to fall into the fitting;2, for the data with different values of the attributes, the value of more than the attribute will have a greater impact on the random forest, so the random forest in this data output attribute weights are not credible.

7, Adaboost

Advantages: 1, adaboost as a classifier, the classification accuracy is very high. 2. Under the framework of adaboost, a weak learner can be constructed using various regression classification models, which is very flexible without filtering the features. 3 . Not easy to fit.

Disadvantage: 1, the anomaly sample is sensitive, the anomaly sample may obtain the higher weight in the iteration, affects the final strong learner's prediction accuracy. 2, due to the existence of dependency between weak learners, it is difficult to train data in parallel. However, partial parallelism can be achieved by self-sampling SGBT.

8, GBDT

Advantages: 1, can flexibly deal with various types of data, including continuous and discrete values, processing classification and regression problems, 2, in the relatively few parameters of the time, the forecast preparation rate can also be relatively high. This is relative to the SVM, 3, can be used to filter features.

4, using some robust loss function, the robustness of outliers is very strong. such as Huber loss function and quantile loss function.

Disadvantage: 1, because there is a dependency between the weak learners, it is difficult to train data in parallel. However, partial parallelism can be achieved by self-sampling SGBT.

References: 68922113

https://www.jianshu.com/p/169dc01f0589

Machine Learning (11)-Common machine learning algorithms advantages and disadvantages comparison, applicable conditions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.