Transferred from: HTTPS://WWW.QUORA.COM/WHAT-ARE-THE-ADVANTAGES-OF-DIFFERENT-CLASSIFICATION-ALGORITHMS
There is a number of dimensions you can look at the give you a sense of the what is a reasonable algorithm to start with, namely
- Number of training examples
- Dimensionality of the feature space
- Does I expect the problem to be linearly separable?
- is features independent?
- is features expected to linearly dependent with the target variable? *edit:see mycomment on what I mean by this
- is overfitting expected to be a problem?
- What is the system ' s requirement in terms of speed/performance/memory usage ...?
- ...
This list may seem a bit daunting because there is many issues that is not straightforward to answer. The good news though is, which as many problems in life, can address this question by following the Occam ' s Razor princ Iple:use the least complicated algorithm that can address your needs and only go for something more complicated if strict Ly necessary.
Logistic Regression
As a general rule of thumb, I would recommend to start with Logistic Regression. Logistic regression is a pretty well-behaved classification algorithm so can be trained as long as you expect your Featu Res to is roughly linear and the problem to be linearly separable. You can does some feature engineering to turn most non-linear features into linear pretty easily. It is also pretty robust to noise and you can avoid overfitting and even do feature selection by using L2 or L1 Regulariza tion. Logistic regression can also be used in Big Data scenarios since it's pretty efficient and can be distributed using, for example, ADMM (see LOGREG). A final advantage of LR is, the output can be interpreted as a probability. This is something the comes as a nice side effect since you can use it, for example, for ranking instead of Classificatio N.
Even in a case where you would not expect Logistic Regression to work 100%, does yourself a favor and run a simple l2-regula Rized LR to come up with a baseline before the go into using "fancier" approaches.
Ok, so now it has set your baseline with the Logistic Regression, what should is your next step. I would basically recommend II possible directions: (1) SVM ' s, or (2) Tree ensembles. If I knew nothing on your problem, I would definitely go for (2), but I'll start with describing why SVM ' s might is S Omething worth considering.
Support Vector Machines
Support Vectors machines (SVMs) use a different loss function (Hinge) from LR. They is also interpreted differently (maximum-margin). However, in practice, an SVM with a linear kernel was not very different from a Logistic Regression (If You are curious, yo u can see how Andrew Ng derives SVMs from the Logistic Regression in the He Coursera Machine learning Course). The main reason you would want to use an SVM instead of a Logistic Regression are because your problem might not being linearl Y separable. In the case, you'll have the use of a SVM with a non linear kernel (e.g. RBF). The truth is and a Logistic Regression can also be used with a different kernel, and at that point you might be better of F going for SVMs for practical reasons. Another related reason to use SVMs was if you were in a highly dimensional space. For example, SVMs has been reported to work better for text classification.
Unfortunately, the major downside of SVMs is that they can being painfully inefficient to train. So, I would not recommend them for any problem where you have many training examples. I would actually go even further and say that I would not recommend SVMs for most ' industry scale ' applications. Anything beyond a toy/lab problem might be better approached with a different algorithm.
Tree Ensembles
This gets me to the third family of algorithms:tree ensembles. This basically covers, distinct algorithms:random forests and Gradient Boosted Trees. I'll talk about the differences later, but for now let me treat them as one for the purpose of comparing them to Logisti C Regression.
The Tree ensembles has different advantages over LR. One main advantage is this they do not expect linear features or even features that interact linearly. Something I didn't mention in LR was that it can hardly handle categorical (binary) features. Tree ensembles, because they is nothing more than a bunch of decision Trees combined, can handle this very well. The other main advantage was, because of how they was constructed (using bagging or boosting) these algorithms handle Very well-dimensional spaces as well as large number of training examples.
As for the difference between Random forests (RF) and Gradient Boosted decision Trees (GBDT), I won ' t go to many details , but one easy-to-understand it is that Gbdts would usually perform better, but they was harder to get right. More concretely, Gbdts has more hyper-parameters to tune and is also more prone to overfitting. RFs can almost work ' out of the box ' and that's one reason why they was very popular.
Deep Learning
Last but not least, this answer would is not being complete without at least a minor reference todeep learning. I would definitely not recommend this approach as a general-purpose technique for classification. But, you might probably has heard how well these methods perform in some cases such as image classification. If you had gone through the previous steps and still feel you can squeeze something out of your problem, you might want t o Use a deep learning approach. The truth is so if you use a open source implementation such as Theano, you can get a idea about how some of these Approa Ches perform in your dataset pretty quickly.
Summary
So, recapping, start with something simple like the Logistic Regression to set a baseline and only make it more complicated if You need to. At this point, tree ensembles, and in particular Random forests since they is easy to tune, might is the right-of-the-go. If you are feel there is still a-improvement, try GBDT or get even fancier and go for deep learning.
You can also take a look at the Kaggle competitions. If you search for the keyword ' classification ' and select those that is completed, you'll get a good sense of what peop Le used to win competitions, that might is similar to your problem at hand. At the point you'll probably realize that using a ensemble is always likely to make things better. The only problem with ensembles, of course, was, they require to maintain all the independent methods working in Parall El. That might is your final step to get as fancy as it gets.
How do I select a classifier? LR, SVM, Ensemble, deep learning