A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Original: http://www.52ml.net/15063.htmlHow to choose a machine learning algorithmMay 7, 2014 machine learning smallroof
How does you know the learning algorithm to choose for your classification problem? Of course, if you really care about accuracy, your best bet was to test out a couple different ones (making sure to try Dif Ferent parameters within algorithm as well), and select the best one by Cross-validation. But if you ' re simply looking for a "good enough" algorithm for your problem, or a place to start, here is some general Gu Idelines I ' ve found to work well over the years.
How do I decide which machine learning algorithm to use for a classification problem? Of course, if you really care about accuracy, the best way to do this is to test a whole bunch of different algorithms (and also to make sure that you test the parameters on every algorithm), and finally choose the best performance in cross-validation. If you're just trying to find a "good enough" algorithm for your problem, or a starting point, here are some general guidelines that I think are good for years.How large are your training set? How big is the training set?
If your training set is small, high bias/low variance classifiers (e.g., Naive Bayes) has an advantage over low Bias/high Variance classifiers (e.g, kNN), since the latter would overfit. But low Bias/high variance classifiers Start-to-win out as your training set grows (they has lower asymptotic error), Sin Ce high bias classifiers aren ' t powerful enough to provide accurate models.
In the case of a small training set, a classifier with high/low variance (such as naive Bayes) has an advantage over a classifier with a low deviation/takakata difference (such as the K nearest neighbor) because the latter is prone to overfitting. However, with the increase of the training set, the low deviation/takakata difference classifier will begin to have an advantage (they have lower asymptotic error), because the high-deviation classifier is less force-efficient to provide accurate models.
You can also think of this as a generative model vs. Discriminative model distinction.
You can also think of this as the difference between generating models and discriminant models.Advantages of some particular algorithms advantages and disadvantages of some common algorithms
Advantages of Naive Bayes: Super simple, you ' re just doing a bunch of counts. If The NB conditional independence assumption actually holds, a Naive Bayes classifier would converge quicker than Discrimi Native models like the logistic regression, so you need less training data. And even if the NB assumption doesn ' t hold, a NB classifier still often does a great job in practice. A Good bet if want something fast and easy this performs pretty well. Its main disadvantage is the it can ' t learn interactions between features (e.g., it can ' t learn that although your love mo VIES with Brad Pitt and Tom Cruise, you hate movies where they ' re together).
Naive Bayes: The Giant is simple, you just have to do some arithmetic. If the conditional independence hypothesis is true, naive Bayesian classifiers will converge faster than discriminant models, such as logistic regression, so you only need less training data. Even if this hypothesis is not established, naive Bayesian classifier still has a decent performance in practice. This is a good choice if you need to be quick and easy and perform well. The main drawback is that it can't learn the interaction between features (for example, it can't learn that you love Donnie Yen and Jiang Wen's films, but hate the film "Cloud" that they co-starred in).
Advantages of Logistic Regression: Lots of ways to regularize your model, and you don ' t has to worry as much on your features being correlated, like you Do in Naive Bayes. You also has a nice probabilistic interpretation, unlike decision trees or SVMs, and you can easily update your model to Take the new data (using an online gradient descent method), again unlike decision trees or SVMs. Use it if you want a prob Abilistic framework (e.g., to easily adjust classification thresholds, to say if you ' re unsure, or to get confidence int Ervals) or if you expect to receive more training data in the "future" and want to being able to quickly incorporate into Your model.
Logistic regression: There are a lot of regularization models, and you don't have to worry about whether your features are relevant as you do with naive Bayes. You'll also get a good probability explanation compared to the decision tree and support vector machines, and you can even easily update the model with new data (using the online gradient descent algorithm). If you need a probabilistic architecture (such as simply adjusting the classification threshold, indicating uncertainty, or getting a confidence interval), or you want to quickly integrate more training data into the model, use it.
Advantages of decision Trees: easy to interpret and explain (for some people–i ' m not sure I fall into this camp). They easily handle feature interactions and they ' re non-parametric, so you don't have to worry about outliers or whether T He data is linearly separable (e.g), decision trees easily take care of cases where you have class A at the low end of the SOM E feature X, Class B in the mid-range of feature X, and A again at the high end). One disadvantage is this they don ' t support Online learning and so you had to rebuild your the tree when the new examples come on. Another disadvantage is so they easily overfit, but that's where ensemble methods like random forests (or boosted trees) Come in. Plus, random forests is often the winner for lots of problems in classification (usually slightly ahead of SVMs, I Believ e), they ' re fast and scalable, and you don ' t has to worry about tuning a bunch of parameters what do I with SVMs, so th EY seem to is quite popular these days.
Decision Trees: easy to explain (for some people-I'm not sure if I'm here). It can handle the interaction between features without stress and is non-parametric, so you don't have to worry about outliers or whether the data is linear (for example, a decision tree can easily handle a class A at the end of a feature dimension X, Category B is in the middle, and then category A appears in the case of feature Dimension x front end). One drawback is that it doesn't support online learning, so the decision tree needs to be rebuilt after the new sample arrives. Another drawback is the ease of overfitting, but this is also the entry point for integration methods such as random forests (or ascending trees). In addition, random forest is often the winner of many classification problems (usually better than support vector machines, I think), it is fast and adjustable, and you don't have to worry about having to tune a bunch of parameters like a support vector machine, so it seems pretty popular lately.
Advantages of SVMs: High accuracy, nice theoretical guarantees regarding overfitting, and with a appropriate kernel they can work well even I f you ' re data isn ' t linearly separable in the base feature space. Especially popular in text classification problems where very high-dimensional spaces is the norm. Memory-intensive, hard-to-interpret, and kind of annoying to run and tune, though, so I think random forests is starting To steal the crown.
Support vector machine: high accuracy rate, in order to avoid over-fitting provides a good theoretical guarantee, and even if the data in the original feature space is not divided, as long as the appropriate kernel function, it can run very well. It is particularly popular in text categorization issues that are extremely high-dimensional. Unfortunately, memory consumption is large, difficult to explain, running and tuning is also a bit annoying, so I think the random forest to begin to replace.But ..... However...
Recall, though, that better data often beats better algorithms, and designing good features goes a long-to. And if you had a huge dataset, then whichever classification algorithm your use might not matter so much in terms of class Ification performance (so choose your algorithm based on speed or ease of use instead).
However, recall that good data is better than good algorithm, design good features is helpful. If you have a very large data set, whichever algorithm you use may not have much impact on the performance of the classification (at this point, the choice is based on speed and usability).
And to reiterate what I said above, if you really care about accuracy, you should definitely try a bunch of different clas Sifiers and select the best one by Cross-validation. Or, to take a lesson from the Netflix Prize (and Middle Earth), just use an ensemble method to choose them all.
Once again, if you really care about accuracy, you have to try a variety of classifiers and choose the best by cross-validation. Or take the Prize from Netflix (and middle Earth) and use it in an integrated way to get them together.English Original: Choosing a machine learning ClassifierHow to choose machine Learning algorithm
How to choose machine learning algorithm to turn
Start building with 50+ products and up to 12 months usage for Elastic Compute Service