Logistic Regression vs Decision Trees vs Svm:part II

Source: Internet
Author: User
Tags svm

This was the 2nd part of the series. Read the first part here:logistic Regression vs decision Trees vs Svm:part I

In this part we'll discuss how to choose between Logistic Regression, decision Trees and support Vector machines. The most correct answer as mentioned in the first part of this 2 part article, still remains it depends. We ' ll continue our effort to shed some light on, it depends on what. All three of these techniques has certain properties inherent by their design, we'll elaborate on some in order to Provid E few pointers on their selection for your particular business problem.

We ll start with Logistic Regression , the most prevalent algorithm for solving industry scale problems, although Its losing ground to other techniques with progress in efficiency and implementation ease of the other complex algorithms.

A very convenient and useful side effect of a logistic regression solution is the IT doesn ' t give you discrete output or outright classes as output. Instead get probabilities associated with each observation. You can apply many standard and custom performance metrics on the probability score to get a cutoff and in turn classify Output in a-which best fits your business problem. A very popular application of this property is scorecards in the financial industry, where can adjust your threshold [ Cutoff] to get different results for classification from the same model. Very few other algorithms provide such scores as a direct result. Instead their outputs is discreet direct classifications. Also, logistic regression is pretty efficient in terms of time and memory requirement. It can applied on distributed data and it also have online algorithm implementation to handle large data on less RESOURC Es.

In addition to above, logistic regression algorithm are robust to small noise in the data and are not particularly AFFEC Ted by mild cases of multi-collinearity. Severe cases of multi-collinearity can be handled by implementing logistic regression with L2 regularization, although if A parsimonious model is needed, L2 regularization isn't the best choice because it keeps all the features in the model.< /p>

Where Logistic regression starts to falter are, when you have a large number of features and good chunk of missing data. Too Many categorical variables is also a problem for logistic regression. Another criticism of logistic regression can be is that it uses the entire data for coming up with its scores. Although this isn't a problem as such, but it can being argued that ' obvious ' cases which lie at the extreme end of scores Should not really is a concern when is trying to come up with a separation curve. It should ideally is dependent on those boundary cases, some might argue. Also If some of the features is Non-linear, you'll have it rely on transformations, which become a hassle as size of your Feature space increases. We have picked few prominent pros and cons from our discussion to summaries things for logistic regression.

Logistic Regression Pros:
    • Convenient probability scores for observations
    • Efficient implementations available across tools
    • Multi-collinearity is not really a issue and can be countered with L2 regularization to an extent
    • Wide spread industry comfort for logistic regression solutions [oh so ' s important too!]
Logistic Regression Cons:
    • Doesn ' t perform well when feature space is too large
    • Doesn ' t handle large number of categorical features/variables well
    • relies on transformations for Non-linear features
    • relies on entire data [not a very serious drawback I ' d say]

Let's discuss decision Trees and support Vector machines.

Decision Trees are inherently indifferent to monotonic transformation or non-linear features [This is different from non line AR correlation among predictors] because they simply cut feature space in rectangles [or (hyper) cuboids] which can adjust themselves to any monotonic transformation. Since decision Trees Anyway is designed to work with discrete intervals or classes of predictors, any number of categoric Al variables is not really a issue with decision trees. Models obtained from decision tree was fairly intuitive and easy-to-explain to business. Probability scores is not a direct result but can use the class probabilities assigned to terminal nodes instead. This brings us to the biggest problem associated with decision Trees, which is, they is highly biased class of models. You can make a decision the tree model on your training set which might outperform all other algorithms but it's ll prove to be A poor predictor on your test set. You'll have the rely heavily on pruning and the cross ValIdation to get a non-over-fitting model with decision Trees.

This problem of over-fitting was overcome to large extent by using Random forests, which was nothing but a very clever exte Nsion of decision Trees. But the random forest take away easy to explain business rules because now we have thousands of such trees and their majority Votes to make things complex. Also by decision Trees has forced interactions between variables, which makes them rather inefficient if most of your VA Riables have no or very weak interactions. On the other hand this design also makes them rather less susceptible to multicollinearity. whew!

Summarizing decision Trees:

Decision Trees Pros:
    • Intuitive decision Rules
    • Can Handle Non-linear Features
    • Take to account variable interactions
Decision Trees Cons:
    • Highly biased to training set [Random forests to your rescue]
    • No ranking score as direct result

Now-to-support Vector machines. The best thing about support vectors machines is, they rely on boundary cases to build the much needed separating curve . They can handle non linear decision boundaries as we saw earlier. Reliance on boundary cases also enables them to handle missing data for "obvious" cases. SVM can handle large feature spaces which makes them one of the favorite algorithms in text analysis which almost always r Esults in huge number of features where logistic regression are not a very good choice.

Result of SVMs is not as not as intuitive as decision trees for a layman. With non linear kernels, SVMs can is very costly to train on huge data. In Summary:

SVM Pros:
    • Can Handle large feature space
    • Can Handle non-linear Feature interactions
    • Do not rely on entire data
SVM Cons:
    • Not very efficient with large number of observations
    • It can tricky to find appropriate kernel sometimes

I have tried to compile a simple workflow for your to decide which algorithm to use out of these three, which is as follows :

    • Always-start with logistic regression, if-nothing and then-to-use the performance as baseline
    • See if decision trees (Random forests) provide significant improvement. Even if you don't end up using the-resultant model, you can use the random forest results to remove noisy variables
    • Go for SVM If you have large number of features and number of observations is not a limitation for available resources an D Time

At the end of the day, remember this good data beats any algorithm anytime. Always see if you can engineer a good feature by using your domain knowledge. Try various iterations of your ideas while experimenting with feature creation. Another thing to try with efficient computing infra available these days are to use ensembles of multiple models. We ' ll discuss them next, so, stay tuned!

Original address

Logistic Regression vs Decision Trees vs Svm:part II

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.