In machine learning, there is a theorem called "No free Lunch". In short, it points out that no one algorithm works for all problems, especially in supervised learning, or predictive modeling.
For example, you can't say that neural networks are always better than decision trees, and vice versa. There are many factors that work, such as the size and structure of a dataset.
Therefore, you should try a number of different algorithms for the specific problem, and set aside a data "test sets" to evaluate performance and select winners.
Of course, the algorithm you try must fit your problem, that is, choosing the Right machine learning task. For example, if you need to clean the house, you may use a vacuum cleaner, a broom or a mop, but you will not take a shovel and start digging.
Big principle
There is a general principle, however, that all of the monitoring machine learning algorithms are based on predictive modeling.
The machine learning algorithm is described as learning a target function f, which best maps the input variable X to the output variable y:y = f (x)
This is a common learning task, and we can predict Y based on a new sample of the input variable X. We do not know the appearance or form of function f. If we know it, we will use it directly, without using machine learning algorithms to learn from the data.
The most common machine learning algorithm is to learn to map y = f (x) to predict the y of the new X. This is called predictive modeling or predictive analytics, and our goal is to make the most accurate predictions possible.
For beginners who want to learn the basics of machine learning, this article outlines the top 10 machine learning algorithms used by data scientists.
1. Linear regression
Linear regression may be one of the most well-known and understandable algorithms in statistics and machine learning.
Predictive modeling focuses on minimizing model errors or making the most accurate predictions possible at the expense of explanatory. We will borrow and reuse algorithms from many different fields, including statistics, and use them for these purposes.
The representation of a linear regression is an equation that describes a line that is best suited to represent the input variable x and the output variable y relationship by finding the specific weight of the input variable, called the coefficient B.
Linear regression
Example: y = B0 + B1 * x
We will predict y based on input x, and the goal of the linear regression learning algorithm is to find the values of coefficients B0 and B1.
You can use different techniques to learn linear regression models from data, such as linear algebraic solutions for general least squares and gradient descent optimization.
Linear regression has existed for more than more than 200 years and has been extensively studied. Some of the experience with this technique is to remove very similar (correlated) variables as much as possible and to remove noise. This is a fast, simple technique that you can try first.
2. Logistic regression
Logistic regression is another technique that machine learning learns from statistics. It is the preferred method for solving two classification problems.
Logistic regression is similar to linear regression, where the goal is to find the weight of each input variable, that is, the coefficient value. Unlike linear regression, the logistic regression uses a nonlinear function called a logistic function to transform the output's predictions.
The logistic function looks like a large S, and can convert any value into a range of 0 to 1. This is useful because we can specify that the output value of the logistic function is 0 and 1 (for example, the input is less than 0.5 and the output is 1) and predicts the category value.
Logistic regression
The prediction of Logistic regression can also be a probability of a given data instance (belonging to category 0 or 1) because of the model's learning style. This is useful for questions that need to provide more evidence for the predictions.
Like linear regression, Logistic regression works better when deleting properties unrelated to the output variable and very similar (related) properties. It is a fast learning model and is very effective for two classification problems.
3. Linear discriminant Analysis (LDA)
Logistic regression is a classification algorithm, traditionally, it is limited to only two categories of classification problems. If you have more than two categories, then linear discriminant analysis is the preferred linear classification technique.
The expression of LDA is very simple and straightforward. It is composed of statistical attributes of the data and is calculated for each category. The LDA for a single input variable includes:
Linear discriminant Analysis
The way to make predictions is to calculate the discriminant values for each category and predict the categories with the largest values. This technique assumes that the data is Gaussian (bell-shaped), so it is best to pre-remove outliers from the data. This is a simple and powerful way to deal with the problem of classifying predictive modeling.
4. Classification and regression tree
Decision tree is an important algorithm for predictive modeling machine learning.
The representation of a decision tree model is a binary tree. This is the algorithm and data structure of the two fork tree, nothing special. Each node represents a separate input variable x and a split point on the variable (assuming the variable is a number).
Decision Tree
The leaf node of the decision tree contains an output variable y for prediction. Predictions can be made by traversing the tree's split point until a leaf node is reached and the category value of the node is output.
Decision tree Learning speed and prediction speed are very fast. They can also solve a number of problems and do not require special preparation for the data.
5. Naive Bayes
Naive Bayes is a simple but powerful predictive modeling algorithm.
The model consists of two probabilities, both of which can be calculated directly from the training data: 1 The probability of each category, 2) Given the value of each x, the conditional probabilities of each category. Once calculated, the probabilistic model can be used to predict new data using Bayes theorem. When your data is real, you usually assume a Gaussian distribution (bell curve), so you can simply estimate these probabilities.
Bayes theorem
Naive Bayes is simple because it assumes that each input variable is independent. This is a powerful assumption, but the real data is not, but the technique is very useful in a number of complex issues.
6. K Nearest Neighbor algorithm
The KNN algorithm is very simple and effective. The KNN model representation is the entire training data set. Isn't it simple?
The KNN algorithm searches the entire training set for K most similar instances (neighbors) and summarizes the output variables of the K-instances to predict new data points. For regression problems, this may be the average output variable, which may be the majority (or most common) category value for the classification problem.
The trick is how to determine the similarity between data instances. If the units of measure are the same (for example, in inches), the simplest technique is to use Euclidean distance, which you can calculate directly from the difference between each input variable.
K Nearest Neighbor algorithm
KNN requires a lot of memory or space to store all the data, but the calculation (or learning) is performed only when predictions are needed. You can also update and manage training instances at any time to maintain the accuracy of your predictions.
The concept of distance or tightness may disintegrate in very high dimensions (many input variables), which negatively affects the performance of the algorithm on your problem. This is called dimension catastrophe. So you'd better use only those input variables that are most relevant to the Predictor output variable.
7. Learning Vector Quantization
One disadvantage of the K-nearest neighbor algorithm is that you need to traverse the entire training data set. The Learning Vector quantization algorithm (LVQ) is an artificial neural network algorithm that allows you to choose the number of training instances and to learn exactly what these instances should be.
Learning Vector Quantization
The representation of LVQ is a collection of codebook vectors. These are randomly selected at the beginning and are gradually adjusted to best summarize the training data set in multiple iterations of the learning algorithm. After learning, codebook vectors can be used for predictions (similar to the K-nearest neighbor algorithm). The most similar nearest neighbor (the best matching codebook vector) is found by calculating the distance between each code-based vector and the new data instance. It then returns the category value of the best match cell or (the actual value in the regression) as the prediction. If you readjust the data so that it has the same range (for example, between 0 and 1), you get the best results.
If you find that KNN achieves good results on your data set, try using LVQ to reduce the memory requirements for storing the entire training data set.
8. Support Vector Machine (SVM)
Support Vector machines are probably one of the most popular and widely discussed machine learning algorithms.
A hyper-plane is a line that divides the input variable space. In SVM, select a super plane that can best divide the input variable space according to the input variable category (category 0 or Category 1). In the two dimensions, you can see it as a line, and we assume that all the input points can be completely separated by this line. The SVM learning algorithm finds the coefficients that allow the super plane to best segment the category.
Support Vector Machine
The distance between the hyper-plane and the nearest data point is called the interval. The best or most ideal hyper-plane with two separate categories has a maximum spacing. Only these points are related to defining the superelevation plane and building the classifier. These points are called support vectors, and they support or define a hyper plane. In fact, the optimization algorithm is used to find the values of the coefficients that maximize the interval.
SVM is probably one of the most powerful immediately available classifiers and is worth a try.
9. Bagging and random forests
Random Forest is one of the most popular and powerful machine learning algorithms. It is a Bootstrap Aggregation (also known as bagging) Integrated machine learning algorithm.
Bootstrap is a powerful statistical method of estimating quantities from data samples. such as averages. You extract a large number of samples from the data, calculate the average, and then average all the averages to better estimate the true average.
Bagging uses the same method, but it estimates the entire statistical model, the most common of which is the decision tree. Extract multiple samples in the training data and model each data sample. When you need to make predictions for new data, each model makes predictions and averages all the predictions to better estimate the true output value.
Random Forest
Random forest is a kind of adjustment to this method, in the method of random forest, decision tree is created in order to make sub-optimal segmentation by introducing randomness, rather than choosing the best segmentation point.
As a result, the models created for each data sample will be different from other methods, although the methods are unique and different, they are still accurate. Combined with their predictions, the actual output values can be better estimated.
If you get good results with higher-variance algorithms such as decision trees, you can usually get better results by bagging the algorithm.
Boosting and AdaBoost.
Boosting is an integrated technique that attempts to integrate some weak classifiers to create a strong classifier. This is done by building a model from the training data and then creating a second model to try to correct the error of the first model. Always add models until you can perfectly predict the training set, or the number of models added has reached the maximum number.
AdaBoost is the first truly successful boosting algorithm developed for the two classification. This is the best starting point for understanding boosting. The modern boosting method is based on AdaBoost, the most notable of which is the random gradient ascension.
AdaBoost
AdaBoost is used in conjunction with a short decision tree. After the first decision tree is created, the performance of the tree on each training instance is used to measure how much attention the next decision tree should pay to each training instance. Hard-to-predict training data is assigned more weight, while easily predictable data allocations are less weighted. The model is created in turn, and each model updates the weights on the training instance, affecting the next decision tree in the sequence. After all decision trees are established, the new data is forecasted and its performance is evaluated through the accuracy of each decision tree on the training data.
Because you are putting too much attention on correcting algorithm errors, it is important to have clean data with deleted outliers.
Summarize
In the face of various machine learning algorithms, beginners often ask: "Which algorithm should I use?" The answer to this question depends on a number of factors, including: (1) the size, quality and characteristics of the data, (2) The Available calculation time, (3) The urgency of the task, and (4) What you want to do with the data.
Even experienced data scientists cannot tell which algorithm will perform best before trying out different algorithms. While there are many other machine learning algorithms, the most popular algorithms are discussed in this article. If you are a novice in machine learning, this will be a good starting point for learning.
The classical algorithm of machine mining