L1 and L2 regularization items, also called penalty items, are designed to limit the parameters of the model and prevent the model from going over you and adding an entry after the loss function.
- L1 is the sum of the absolute values of each parameter of the model
- L2 is the square sum of each parameter of the model.
Difference:
- L1 tend to produce a small number of features, while others are 0.
- From the graphic understanding: should be the optimal parameter value of a large probability of appearing on the axis, so that the weight of a certain dimension is 0, resulting in a sparse weight matrix.
- From the Bayesian point of view: plus the regularization term L1, equivalent to θ assumed a prior distribution to the Laplace distribution
- L2 will choose a more pair of features, which will be close to 0. The very small probability of the optimal parameter value appears on the axis because the parameters for each dimension are not 0. When minimizing | | w| | , each item is approximated to 0 rather than sparse.
-
- It is understood from the diagram that L2 constraints have no corners in the solution space, and therefore tend to constrain the size of their values rather than making them 0
- From the Bayesian point of view: L2 equivalent to a prior distribution of θ to the Gaussian distribution.
function: L1 regularization can produce sparse models, and L2 regularization items can prevent overfitting (because the fitting process tends to make the weights as small as possible, and finally constructs a model with a smaller number of parameters. Because it is generally considered that the model with small parameter values is relatively simple, can adapt to different data sets, and to some extent avoids overfitting data (anti-disturbance ability).
Reference:
Hangyuan Li "Statistical Learning method"
Http://www.cnblogs.com/lyr2015/p/8718104.html
52888315
52433975
Regularization of L1 and L2