Summary:
1. Frequently Asked Questions
1.1 What is deviation and variance?
1.2 Why is there any way to prevent or overcome overfitting?
2. Model Selection
3. Feature Selection
4. Feature Engineering and data preprocessing
Content:
1. Frequently Asked Questions
1.1 What is deviation and variance?
The Generalization error (general error) can be decomposed into deviations (bias) squared plus variance (variance) plus noise (noise). The deviation measures the expected prediction of the learning algorithm and the deviation degree of the real result, depicts the ability of the learning algorithm to fit, and the variance measures the change of learning performance caused by the change of the same size training set, and depicts the influence caused by the data disturbance. Noise expresses the lower bound of expected generalization error which can be achieved by any learning algorithm in the current task, and describes the difficulty of the problem itself. The general training degree is stronger, the deviation is smaller, the variance is bigger, the generalization error generally has a minimum value in the middle, if the deviation is large, the variance is small, at this time is generally called under the fitting, but the deviation is small, the variance is bigger is called overfitting.
1.2 Why is there any way to prevent or overcome overfitting?
Generally in machine learning, the error of the learner in the training set is called the training error or the experience error, and the error on the new sample is called the generalization error. Obviously we want to get a learner with a small generalization error, but we don't know the new sample beforehand, so we actually try to minimize the error of experience. However, when a learner learns to train a sample too well, it is often possible to consider the characteristics of the training sample itself as the general nature of the potential sample. This leads to a decrease in generalization performance, which is called overfitting, whereas under-fitting generally refers to the general nature of the training sample that has not yet been studied, and there are still large errors in the training set.
Under-fitting: Generally less fitting is easier to solve, such as increasing the complexity of the model (increasing the branches in the decision tree, increasing the number of training in the neural network, etc.), increasing the features ("Combination", "generalization", "correlation"), and reducing the regularization factor (reference).
Over fitting: The solution of overfitting usually has to re- clean the data (which causes the overfitting to be caused by the data impure), increases the sample size, reduces the model complexity, uses prior knowledge (L1,L2 regularization), uses cross-validation, Early stopping and so on .
Knowledge part of machine learning theory--deviation variance balance (bias-variance tradeoff)