How to choose the appropriate machine learning algorithm

Source: Internet
Author: User
Tags machine learning linear regression logistic regression machine learning algorithm supervised learning

When deciding which algorithm to use, you must consider the type and type of data. Some algorithms require only a small number of samples, while others require a large number of samples. Some algorithms can only process certain types of data. For example, the naive Bayesian algorithm complements the categorical data but is completely insensitive to missing data.

Missing data affects some models more than others. Even for models that can handle missing data, they can be affected (missing data for some variables can lead to poor predictions).

Designing the right solution for real-world problems is not just an applied math problem, but also needs to consider business needs, regulations, stakeholder concerns, and considerable expertise. Combining and balancing these aspects is critical when solving a machine problem.

Original translation:

Machine learning is a combination of art and science. No machine learning algorithm can solve all the problems. There are several factors that can influence your decision to choose a machine learning algorithm.

Some questions are very specific and require unique methods to solve them. For example, the recommendation system is a very common machine learning algorithm that solves very specific problems. Other issues are very open and require a trial and error approach. Supervised learning, classification and regression are very open. They can be used for anomaly detection or to create a more general prediction model. Some of the decisions we make when choosing a machine learning algorithm depend more on business decisions than on optimization or the technical aspects of the algorithm. In the following, we will discuss some of the factors that help to narrow down the choice of machine learning algorithms.

Data science process

Before you start looking at different machine learning algorithms, you must have a clear understanding of your data, the problems you face, and the limitations.

Know your data

When deciding which algorithm to use, you must consider the type and type of data. Some algorithms require only a small number of samples, while others require a large number of samples. Some algorithms can only process certain types of data. For example, the naive Bayesian algorithm complements the categorical data but is completely insensitive to missing data.

you must:

Know your data

1. Focus on summarizing statistics and visualization data

Percentiles helps identify the scope of most data

Average and Medians describe the trend of concentration

Correlations can indicate a close relationship

2. Data visualization

Box Plots recognize outliers

Density Plots and Histograms show the distribution of data

Scatter Plots can describe binary relationships

Organize your data

1. Handle missing values. Missing data affects some models more than others. Even for models that can handle missing data, they can be affected (missing data for some variables can lead to poor predictions).

2. Choose how to handle outliers

Outliers are very common in multidimensional data.

Some models are less affected by outliers than others. In general, the decision tree model is less affected by outliers. However, regression models or other models that use equations are bound to be affected by outliers.

Outliers are caused by poor data collection or are true extremes.

3. Does the data need to be summarized?

Enhance your data

1. Feature engineering is the process from raw data to modeling available data. This has several purposes:

Make the model easier to interpret (such as binning)

Grab more complex relationships (such as neural networks)

Reduce data redundancy and dimensions (such as principal component analysis)

Rescale variables (such as normalization or regularization)

2. Different models may have different requirements for feature engineering. Some requirements have been built into feature engineering.

Problem classification

Then there is the problem classification. This is a two-step process.

1. Sort by input data:

If it is tag data, it is a supervised learning problem.

If it is unlabeled data and wants to find a structure, it is an unsupervised learning problem.

If you want to optimize an objective function by interacting with the environment, it is to reinforce learning problems.

2. Sort by output data:

If the output data of the model is a number, it is a regression problem.

If the output data of the model is a class, then it is a classification problem.

If the output data of the model is a set of input data, it is a clustering problem.

Do you want to detect an exception? That is the anomaly detection.

Know your limits

What is the data storage capacity? Depending on the storage capabilities of your system, you may not be able to store large numbers of classification/regression models or large amounts of data to be clustered. For example, an embedded system is like this.

Do you have to make a quick prediction? In real-time applications, it is obviously important to make predictions as quickly as possible. For example, in automatic driving, road signs must be classified as quickly as possible to avoid accidents.

Do you have to learn quickly? In some cases, a quick training model is required. Sometimes you need to quickly update your model with different data sets in a hurry.

Find available algorithms

You have a clear understanding of your situation, and the next step is to identify the applicable algorithms that can be implemented using the tools in your hands. Factors that influence model selection include:

Whether the model meets business goals

How much preprocessing is required for the model

Model accuracy

Model interpretability

Model speed: How long does it take to build a model, and how long does it take for the model to make predictions?

Model scalability

An important criterion affecting algorithm selection is the complexity of the model. In general, more complex models:

Need more features to learn and predict (such as using two features vs using 10 features to predict a target)

Need more complex feature engineering (such as using polynomial terms, interactions, or principal components)

Need more computational overhead (such as 1 decision tree vs. random forest of 100 decision trees)

In addition, the same machine learning algorithm can become more complicated due to the number of parameters or the choice of certain hyperparameters. E.g:

A regression model may have more features or polynomial terms and interaction terms.

A decision tree may have a larger or smaller depth.

Make the same algorithm more complex, which increases the chance of overfitting.

Image

Common machine learning algorithm

Linear regression

This is probably the simplest machine learning algorithm. When you want to calculate a continuous value, you can use a regression algorithm, and the output data of the classification algorithm is a class. So, whenever you want to predict a future value of an ongoing process, you can use a regression algorithm. However, linear regression is unstable in the case of feature redundancy (that is, the presence of multiple collinearity).

Several use cases for linear regression:

It's time to go from one place to another

Forecast sales of specific products in the next month

The effect of blood alcohol content on physical coordination

Forecast monthly gift card sales and improve annual revenue expectations

Logistic regression

Logistic regression performs binary classification, so the output data is binary. This algorithm applies a nonlinear function (sigmoid) to the linear combination of features, so it is a very small example of a neural network.

Logistic regression provides many ways to model regularization without worrying about whether the features are related, just as with the naive Bayesian algorithm. Compared to decision trees and support vector machines, logistic regression provides excellent probabilistic interpretation and can easily update models with new data. If you want to build a probabilistic framework, or if you want to integrate more training data into the model quickly, you can use logistic regression. This algorithm can also help you understand the contributing factors behind the predictions, not a black box approach.

Several use cases for logistic regression:

Forecast customer churn

Credit score and fraud detection

Measure the effectiveness of marketing campaigns

Decision tree

People rarely use a single decision tree, but when combined with many other decision trees, they can become very efficient algorithms, such as random forests and gradient lift trees.

Decision trees can easily handle feature interactions and are non-parameterized, so there is no need to worry about outliers or whether data is linearly separable. The disadvantage is that online learning is not supported, so the decision tree must be rebuilt when new samples arrive. Another disadvantage is that it is easy to overfit, but integrated methods such as random forests and lifting trees can overcome this shortcoming. Decision trees also take up a lot of memory (the more features, the deeper and larger the decision tree).

Decision trees are an excellent tool to help you choose between several action plans.

Investment decision

Customer churn

Bank loan defaulter

Self-built VS purchase decision

Sales lead qualification

K-means

Sometimes, you don't know any tags, your goal is to assign tags based on the characteristics of the object. This is called a clustering task. One use case for clustering algorithms is to group a large group of users based on certain common attributes.

If you have questions about "how this is organized" in your question statement, or if you want to group or focus something into a specific group, then you should use a clustering algorithm.

The biggest drawback of K-Means is that you must know in advance how many clusters will be in your data. Therefore, this may require a lot of trials to "guess" the optimal K value of the cluster.

Principal Component Analysis (PCA)

PCA can reduce dimensionality. Sometimes, the characteristics of the data are very broad and may be highly correlated with each other. In the case of a large amount of data, the model is easily over-fitting. You can use PCA at this time.

A key to the popularity of PCA is that it provides a synchronous low-dimensional representation of variables in addition to the low-dimensional representation of the sample. Synchronized sample and variable representations provide feature variables that visually find a set of samples.

Support vector machine (SVM)

SVM is a supervised learning method that is widely used for pattern recognition and classification problems (provided there are only two types of data).

The advantage of SVM is its high precision, which has a good theoretical guarantee for avoiding over-fitting, and SVM can work well even if the appropriate kernel function is available, even if the data is not linearly separable in the basic feature space. SVM is particularly popular when it comes to solving text classification problems where high-dimensional space is normal. The disadvantage of SVM is that it consumes a lot of memory, is difficult to interpret, and is not easy to adjust.

Several applications of SVM in reality:

Detecting patients with common diseases such as diabetes

Handwritten text recognition

Text Classification - News reports by topic

Stock price forecast

Naive Bayes

This is a classification method based on Bayes' theorem, which is easy to construct and especially useful for large data sets. In addition to its simple advantages, Naive Bayes is even better than some highly complex classification methods. In the case of CPU and memory resources are limiting factors, Naïve Bayes is also a good choice.

Naive Bayes is super simple, just do some arithmetic. If Naive Bayes' assumption about conditional independence does hold, then the naive Bayes classifier will converge faster than the discriminant model such as logistic regression, so you need less training data. Even if the assumption is not true, the naive Bayes classifier still performs well in practice. If you need to be quick and simple and perform well, Naive Bayes will be a good choice. The main disadvantage is that they can't learn the interaction between features.

Several applications of Naive Bayes in reality:

Sentiment analysis and text classification

Recommendation systems such as Netflix and Amazon

Mark email as spam or non-spam

face recognition

Random forest

Random forests contain multiple decision trees. It solves regression and classification problems with large data sets and helps identify the most important variables from a wide range of input variables. Random forests can be extended to any dimension, and their performance is usually acceptable. Genetic algorithms can be extended to any dimension and any data that knows little about the data itself. Microbial genetic algorithms are the least expensive and simple to implement. However, the learning speed of random forests may be slow (depending on parameterization), and it is not possible to iteratively improve the generated model.

Several applications of random forests in reality:

Predicting high-risk patients

Predict manufacturing parts failure

Forecast loan defaulter

Neural Networks

The neural network contains the connection weights between neurons. The weights are balanced and continue to learn the data points after learning the data points. After ownership is trained, the neural network can be used to predict the class or quantity if a new input data point regression occurs. Neural networks can be used to train extremely complex models that can be used as black boxes without having to perform unpredictable and complex feature engineering before training the model. Together with the “depth method”, even more unpredictable models can be used to achieve new possibilities. For example, with deep neural networks, object recognition has made great strides in the near future. Applied to unsupervised learning tasks, such as feature extraction, deep learning can also extract features from the original image or speech without much human intervention.

On the other hand, neural networks are very difficult to explain, parameterization is extremely troublesome, and it is very resource intensive and memory.

Scikit Quick Reference

Scikit-learn provides a very in-depth, clear and easy-to-understand flowchart to help you choose the right algorithm for your convenience.

Image

In general, you can refer to the above to narrow down the choice of algorithms, but at first it is difficult to know which algorithm is best for you. It is best to iterative screening. Enter your data into a machine learning algorithm that you think might be a good choice, run these algorithms in parallel or sequentially, and finally evaluate the performance of the algorithm and pick the best one.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.