Neural network over-fitting (over fitting) is a difficult problem of neural network learning, the conventional solution is to increase the number of learning samples, but the collection of training samples is often difficult, and the number of samples increased, learning costs also increased. Another relatively simple way to reduce overfitting is regularization.
There are several ways to regularization:
L2 regularization
Modify the cost function (C) to:
c=−1n∑xj[yjlnalj+ (1−yj) ln (1−ALJ)]+12n∑ww2 C =-\frac{1}{n}\sum_{xj}[y_j\ln a_j^l + (1-y_j) \ln (1-a_j^l)] + \frac{1}{2n} \sum_ww^2
L1 regularization
The Modify cost function is:
c=−1n∑xj[yjlnalj+ (1−yj) ln (1−ALJ)]+12n∑w|w| C =-\frac{1}{n}\sum_{xj}[y_j\ln a_j^l + (1-y_j) \ln (1-a_j^l)] + \frac{1}{2n}\sum_w|w|
Dropout
Dropout training method is somewhat strange, every time do back-propagation, then delete half of the hidden layer of neurons, iterative multiple after the network parameters.
Why can reggularization reduce overfitting (over-fitting)? This question is probably difficult to answer, perhaps from Occam's Razor can get some inspiration, Occam's Razor description is as follows:
entities should not being multiplied unnecessary (if not necessary, do not add entities)
Regularization is trying to reduce the complexity of the network. Reference:
http://neuralnetworksanddeeplearning.com/