Drawing learning curves can be used to check whether our learning algorithms are working properly or to improve our algorithms, we often use learning cruves to determine if our algorithm exists bias problem/variance Problem or both.
Learning Curves
is the graph of jtrain (θ) and JCV (θ) and training set size m, assuming that we use two entries to fit our trainning data.
When trainning data has only one, we can fit well, namely jtrain (θ) = 0, and when trainning data has two, we can also fit well, namely jtrain (θ) = 0; With the increase of the number of training data set, Jtrain (θ) also increases;
When trainning data is very small, the generalization (generalization) of the predictive function is very weak, so the JCV (θ) is very large, with the increase of the number of raining data set, the enhancement of generalization capability (the ability to adapt to new samples), JCV (θ) decreases.
Learning curves with high bias--adding training data is useless
When we want to use a straight line to simulate the data in the case of hypothesis in high bias, as shown, we have 5 sample points is, the line is that, we will increase the sample point to 10, the line is the same, not because we increase the number of samples will be better simulation of the data, so for the high bias algorithm, it is useless for us to add training data.
In the case of high bias, at the beginning of the sample point, Jtrain (θ) is small, and as the sample points are increasing, hypothesis can not fit too many samples (Underfit state), Jtrain (θ) is getting larger
In the case of high bias, at the beginning of the sample point, JCV (θ) is very large (because of the lack of generalization ability of the sample points), with the increase of the sample point, the JCV (θ) becomes smaller, smaller to a value will tend to be flat (relative or very large value), that will not change our hypothesis.
in the case of high bias, jtrain (θ) and JCV (θ) tend to have a similar value as the sample grows . (High error)
Learning curves with high variance--added training data is helpful
When our algorithm is in the high variance case, as shown in X has 100 square (assuming the case) and λ value is very small, then our hypothesis is at variance.
For the case of only 5 trainning data, our hypothesis can fit well, that is, when the training set size is small, jtrain (θ) is smaller, and as the training set size increases, Hypothesis do not fit each point, then jtrain (θ) will rise, but still relatively small;
For only 5 trainning data, we have a overfitting phenomenon, when JCV (θ) is very large, with the increase of the sample, our generalization ability, JCV (θ) decreased, but with jtrain (θ) there is a gap( Indicates JCV (θ) >>jtrain (θ), overfitting's performance), at this point if we extend M, i.e. enlarge training set size,jtrain (θ) up, Jcv (θ) drops as shown. Therefore, it is helpful to add training data .
The learning curve in both cases are ideal, and the actual situation can be somewhat different (there may be some noise and disturbing curves), but there will be basic similar results that can help us see if our learning algorithm is in the high Bias/high variance/ or both. So when we want to improve the performance of a learning algorithm, we usually draw learning curve, which allows us to see bias or variance problem
Bias vs. Variance (3)---Use learning curves to judge bias/variance problem