Machine Learning Classification Algorithm Application Scenarios

Source: Internet
Author: User
Keywords machine learning classification algorithms logistic regression random forest

Classification method application scenarios Logistic regression, support vector machine, random forest, GBT, deep learning.

Simple Application Server
USD1.00 New User Coupon
* Only 3,000 coupons available.
* Each new user can only get one coupon(except users from distributors).
* The coupon is valid for 30 days from the date of receipt.

Characteristic dimension
Is it linearly separable
Are the features independent of each other
Features are linear dependent and target variable overfitting problems
Speed, effect, memory limit

logistic regression
Application scenario: feature is approximately linear (what does it mean?), data is linearly separable, non-linear features->linear

advantage:

1. robust to noise and void overfitting,feature selection by using l2 or l1 regularization

2. can be used in Big Data scenarios since it is pretty efficient and can be distributed using, ADMM

3. can be interpreted as probability

support vector machines
in practice: an SVM with a linear kernel is not very different from a Logistic Regression

Application scenarios that need to use svm: data is not linearly separable, and svm with non-linear cores is required (although logistic regression can also use different cores, it is better to use svm for practical reasons (what?)); another scenario Is the feature space dimension is very high, for example, svm has a better effect in text classification

Disadvantages:
SVM training is very time-consuming, so it is not recommended to use svm in the case of large training samples (how big?), or industrial-level data.

Tree ensembles
Random Forests and Gradient Boosted Trees
Advantages of   Tree ensembles compared to Logistic Regression:
It is not required to be linear features (do not expect linear features or even features that interact linearly), such as LR is difficult to deal with categorical features, and Tree Ensembles, which are a collection of decision trees, can easily handle these situations
Due to the process of algorithm construction (bagging or boosting), these algorithms can easily handle high-dimensional data and scenarios with a large amount of training data

RF (Random Forests) vs GBDT (Gradient Boosted Decision Trees)
GBDTwill usually perform better, but they are harder to get right
GBDT has a lot of hyperparameters that can be debugged, and it is easier to overfit,  RFs can almost work "out of the box" and that is one reason why they are very popular (I didn't understand this sentence)

Deep Learning
To summarize start from esimple to set a baseline, only make it more complicated if you need to:
1. Start with a simple Logistic Regression and set a baseline
2. Random Forests, (easy to tune)
3. GBDT
4. fancier model
5. deep learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.