Too fit, less fit
Over fitting and less fitting
Still with the linear regression example, F (x,w) =w1x1+w2x2...wnxn
We're going to get the weights We=∑ni=1 (yi?f (X,W)) 2 with the smallest loss function.
But the order of N is also to be considered, such as a first-order is a line, especially large is a strange curve through the data points are clearly, the former simply can not wear out, the latter is wearing too perfect, are very dangerous. Introduce a picture in a PRML book to explain the description of the image:
Our correct function is the sine function, now we want the model to learn this point, the order is small to wear the past, and the high order model is too complex to cause if there are other correct points can not be good to play in the test set. The corresponding is the cross fitting and the less fitting.
Model capacity (complexity) and model generalization
Model capacity: In the case of linear regression, if the model is too simple, it's obviously not a good fit for the data; On the other hand, if the model is particularly complex, it may be very good to "remember" these features on the training set, but it will collapse on the test set, just as we would have done by rote the example in a different way. That is: increase the model capacity, training errors and test errors will decline, but the excessive increase will still lead to the test error rise model generalization: generalization refers to our model can be applied to other scenarios, some seem to learn very well, in fact, is not scientific, that is not to learn. Universal approximation theorem: As long as enough neurons are given, the ability of the single hidden layer to fully connect the network is always strong enough. That is to say, but speaking ability is meaningless, the expression does not mean that learning, our ultimate goal is to have a good learning ability of the model rather than for something to express, which is no different from the rote examples.