Diagnostic methods for the representation of deviations, variances, and learning curves:
When evaluating hypothetical functions, we are accustomed to dividing the entire sample according to 6:2:2:60% training Set training set, 20% cross-validation set, validation set, and 20% test set, respectively, for fitting hypothesis functions, model selection, and prediction.
The model selection method is:
1. Train 10 models using the training set
2. Cross-validation error (Value of cost function) is calculated for cross-validation set with 10 models
3. Select the model with the lowest cost function value
4. Use the model selected in step 3 to calculate the generalization error (value of the cost function) for the test set.
When you run a learning algorithm, if the algorithm does not perform well, then there are probably two situations:
Is the deviation is larger, or the variance is relatively large. In other words, what happens is either a lack of fit, or a overfitting
Problem. In these two cases, which is related to the deviation, which is related to the variance, or whether it is related to two. Get
It is important to know that it is possible to judge what is happening in both cases. is actually a very effective
Indicators that guide the most effective ways and means to improve the algorithm.
(ii) deviation, variance, learning curve bias, Variance, learning curve
1. The degree of the characteristic quantity D
As in the previous example, with two curve fitting, the errors of the training set and the cross-validation set may be very small; but you use a straight line to fit, regardless of how advanced the algorithm to reduce the cost function, the deviation is still very large, this time we say: polynomial number D is too small, resulting in high deviation, less than fit Similar when using 10 times curve to fit, sample points can pass, corresponding cost function (error) is 0, but with cross-validation set you will find that the fitting is very poor, this time we said: polynomial number D is too large, resulting in high variance, over-fitting. Therefore, the relationship between the polynomial number D and the training set and the cross-validation set error is shown below.
We usually help with the analysis by plotting the cost function errors of the training set and cross-validation set on the same chart as the number of polynomial:
In the following diagram, your model selection is directly related to the final fit result:
Under-Fitting Underfit | | High deviation hi bias
Normal fit just Right | | Both deviations and variances are small
Over Fitting Overfit | | Takakata Difference High Variance
2. Regularization parameters λ
The greater the regularization parameter λ, the greater the penalty for θ, θ->0, assuming that the function is a horizontal line, under-fitting, high-deviation, the smaller the regularization parameter, the weaker the effect of regularization, over fitting and high variance. In the course of training the model, we usually use some normalization methods to prevent overfitting. But we may be normalized too high or too small, that is, when we choose the value of λ, we also need to think about the number of times the polynomial model has just been selected.
We choose a series of lambda values that we want to test, typically a value that renders twice times the relationship between 0-10 (for example:
0,0.01,0.02,0.04,0.08,0.15,0.32,0.64,1.28,2.56,5.12,10 a total of 12). We also divide the data into training sets, cross-validation sets, and test sets.
The method for selecting λ is:
1. Use the training set to train 12 different degrees of normalized models
2. Cross-validation errors for cross-validation sets calculated using 12 models
3. Select the model with the least cross-validation error
4. Using the model chosen in step 3 to calculate the generalization error of the test set, we can also draw the cost function error of the training set and the cross-validation set model on a chart:
3. Sample size M and learning curve learning curve
The learning curve is the relationship between sample size and training set and cross-validation set error, which is divided into two cases of high deviation and high variance (under-fitting and over-fitting).
① High deviation (due to fit):
According to the analysis in the right part of the following figure, by increasing the sample size both the error is very large, that is, the increase of M is useless for the improvement of the algorithm.
In other words, adding data to a training set is not necessarily helpful in the case of high deviations/under-fitting.
How to use the learning curve to identify Gaofangcha/overfitting: Suppose we use a very high polynomial model, and the normalization is very small, we can see that when the cross-validation set error is much larger than the training set error, adding more data to the training set can improve the effect of the model.
In other words, in the case of Gaofangcha/overfitting, adding more data to the training set may improve the algorithm effect.
Finally, the following is summarized:
Six optional next steps, let's take a look at the circumstances under which we should choose:
1. Get more training examples-solve high variance
2. Try to reduce the number of features-solving high variance
3. Try to get more features-solve high deviations
4. Try to add polynomial features--to resolve high deviations
5. Try to reduce the degree of normalization λ--resolve high deviations
6. Try to increase the degree of normalization λ--solve high variance
OK, welcome to AC 1584877347 Q