This section is about overfitting, listening to the understanding of overfitting more profound than before.
First introduced the overfitting, the consequence is that Ein is very small, and eout is very large. Then the causes of overfitting are analyzed separately.
The first is to discuss the cross-fitting of horizontal type complexity. The more complex the model, the more likely it is to produce overfitting. The reason for this result is given: the number of samples is too small. This is in the case of a noise,
So what if there's no noise?
Use the following two images to indicate the effect of each parameter:
It can be seen that both noise and model complexity have an effect on overfitting. It is important to note that the area in the lower-left corner of the right image. A complex model is used
It is equivalent to producing a noise, which is called deterministic noise (more difficult to understand), and corresponds to stochastic noise. The reason for summing up the fit is four:
The data volume n is few, the random noise is high, the certainty noise is high, the excessive VC dimension.
Finally, we propose a method for solving overfitting, including data cleaning/pruning, data hinting, regularization (regularization), confirmation (validation), and
To drive for example to illustrate the role of these methods, the latter two methods are also the contents of the following two lessons.
Data cleaning/pruning is to correct or delete the wrong sample points, processing is simple, but usually such sample points are not easy to find.
Data hinting generate more sample numbers by generating virtual samples
Coursera Machine Learning Course note-Hazard of Overfitting