When we want to solve any kind of machine learning problem, we need to choose the appropriate algorithm. There is a "no free lunch" law in machine learning, that is, no machine learning model can solve all problems. The performance of different machine learning algorithms depends on the size and structure of the data. Therefore, unless we use traditional trial and error experiments, we have no clear way to prove that a choice is right.
However, each machine learning algorithm has its own shortcomings, which also allows us to refer to the choice. Although an algorithm is not universal, each algorithm has features that allow people to quickly select and adjust parameters. Next, we take a look at several common machine learning algorithms for regression problems and summarize the circumstances under which they can be used based on their strengths and weaknesses.
Linear and polynomial regression
The first is a simple case where a linear regression of a single variable is a model used to represent the relationship between a single input independent variable and a dependent variable. Multivariate linear regression is more common, where the model is the relationship between multiple input independent variables and output dependent variables. The model remains linear because the output is a linear combination of input variables.
The third inter-row case is called polynomial regression. The model here is the nonlinear combination of feature vectors, ie the vector is an exponential variable, sin, cos, and so on. This situation requires consideration of the relationship between data and output, and the regression model can be trained with stochastic gradient descent.
Advantages:
-
Fast modeling, useful in situations where the model structure is not complex and the data is small.
-
Linear regression is easy to understand and valuable in business decisions.
Disadvantages:
-
For nonlinear data, polynomial regression is difficult to design because in this case the relationship between the data structure and the characteristic variables must be known.
-
In summary, the performance of these models is not ideal when encountering complex data.
Neural Networks
The neural network contains many interconnected nodes called neurons. The input characteristic variables pass through these neurons and become a linear combination of multivariables. The value multiplied by each characteristic variable is called a weight. Nonlinearity is then applied to this linear combination, allowing the neural network to model complex nonlinear relationships. A neural network can have multiple layers, and the output of one layer is passed to the next layer. Non-linearity is usually not applied at the output. Neural networks are trained using stochastic gradient descent and backpropagation algorithms.
Advantages:
-
Because neural networks have many layers (so there are many parameters) and are nonlinear, they can efficiently model complex nonlinear relationships.
-
Usually we don't have to worry about the data in the neural network. They are flexible in learning any feature vector relationship.
-
Studies have shown that simply increasing the training data of the neural network, whether it is new data or enhancement of the original data, will improve network performance.
Disadvantages:
-
Due to the complexity of the models, they are not easily understood.
-
It may be difficult to train, and it requires a lot of computational power, careful tuning and setting the learning rate.
-
They require a lot of data to achieve higher performance, and generally perform better on small data sets than other machine learning.
Return to the tree and random forest
Starting with the basics, the decision tree is an intuitive model, and the decision maker needs to make a choice at each node to cross the entire "tree." Tree induction is the input of a set of training samples, which determines which attributes are segmented from which attributes, and repeats the process, knowing that all training samples are classified. When building a tree, our goal is to create the purest child nodes with data segmentation. Purity is measured by the concept of information gain. In practice, this is the amount of information needed to further differentiate the current sample by comparing entropy or distinguishing between a single sample and the amount of information needed in the current data set.
Random forests are a simple integration of decision trees, that is, the process of input vectors passing through multiple decision trees. For regression, the output values of all trees are average; for classification, the voting strategy is ultimately used.
Advantages:
-
Very useful for complex, highly nonlinear relationships. They usually achieve very high performance and are better than polynomial regression.
-
Easy to use and understand. Although the final training model learns many complex relationships, the decision boundaries in the training process are easy to understand.
Disadvantages:
-
They are more prone to overfitting due to the nature of the training decision tree. A complete decision tree model can be very complex and contains many unnecessary structures. Although this can sometimes be alleviated by "pruning" and combining with larger random forests.
-
Use a larger random forest to achieve better results, but at the same time slow down and require more memory.
This is a summary of the advantages and disadvantages of the three algorithms. I hope you find it useful!