Machine LEARNING--L1, L2 norm

Source: Internet
Author: User

About L1 Norm and L2 norm of content and diagram, feel has seen times, just read this Daniel blog http://blog.csdn.net/zouxy09/article/details/24971995/, At this moment finally understand that a lost, hurriedly "hot" record!

See the difference between L1 norm and L2 norm from two aspects:

1. Descent Speed:

L1 and L2 are all in a regular way, we add the weight parameter w to the target function in the way of L1 or L2 norm. The model then tries to minimize these weight parameters.

Common objective function form:

  

And this minimization is like a downhill process, the difference between L1 and L2 is that the "slope" is different, such as:

  The L1 is reduced by the "slope" of the absolute value function, while the L2 is reduced by the "slope" of the two-time function. So in fact, around 0, theL1 is falling faster than the L2 . So it will fall to 0 very quickly.

L1 in the folk person called Lasso,L2 person Ridge. In the "machine Learning Combat " book, chapter 8 introduced the "reduction Method" in the chapter on regression, referring to the ridge regression and Lasso.

2. Limitations of model space:

in fact, for L1 and L2 , we can write the following form for the loss function of the rule:

  

  In other words, we limit the model space to a l1-ball of W. To facilitate visualization, we consider a two-dimensional case where the contour of the objective function can be drawn on the (W1, W2) plane, while the constraint becomes a norm ball with a radius of C on the plane. The best solution is where the contour line intersects with the norm Ball for the first time:

  As you can see, the difference between the L1-ball and the L2-ball is that the L1 has "horns" in place where each axis intersects, and that the geodesic of the objective function will intersect at the corner most of the time unless the position is very well placed. Notice that the position of the corner will be sparse , the intersection point in the example has w1=0, and the higher dimension (imagine what the three-dimensional l1-ball is?). In addition to the corners, there are many sides of the contour is also a large probability of becoming the first intersection of the place, and will produce sparsity.

By contrast,L2-ball has no such property , because there is no angle , so the probability that the first intersection occurs in a sparse position becomes very small. This intuitively explains why L1-regularization can produce sparsity, and the reason why l2-regularization does not work.

Therefore, one sentence summary is:L1 will tend to produce a small number of features, while the other features are 0, and L2 will choose more features, these features will be close to 0. Lasso is very useful in feature selection, and ridge is just a rule.

Machine LEARNING--L1, L2 norm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.