From a question:
When we use H10 to fit the curve, we only want to H2 the result. What if we change from H10 to H2?
So we just need to add the restrictions: w3=...=w10=0. Now, we can relax a little bit: any 8 W is 0.
But like this problem is the np-hard problem, you need to enumerate all the cases.
Let's relax a little bit more:
For linear regression problem, this kind of squared condition is very good solution.
The solution process relies mainly on Lagrange Multiplier. Secondly, it is necessary to combine the derivation needed in linear regression:
In other words, the linear regression after adding the restriction is equivalent to the linear regression algorithm whose error measure is Eaug .
In the above deduction, C always does not appear, but there is a λ,λ choice for the algorithm will have how to affect it?
You can see that the greater the λ is, the smaller the C. The tiny λ (which is the equivalent of a tiny constraint) can produce extremely good results .
Why does the algorithm perform better when the constraint is added? Because of the constraint, it is equivalent to the H constraint, which makes the DVC decrease.
Now there is only one problem: how to choose an optimal λ.
、
There is no way to use multiple λ to try to pick out the best λ can. The specific selection process is actually a process of looking at the instrument panel (validation).
Step on the brakes--regularization