4. Lasso regression and Ridge (Ridge) regressionPDF version Download address: https://pan.baidu.com/s/1i5JtT9j HTML version download address: Https://pan.baidu.com/s/1kV0YVqv LASSO from 1996 Robert Tibshirani first proposed that the full name least absolute shrinkage and selection operator Ridge regression, also known as Ridge regression, Tychonoff regularization (Tikhonov regularization), Is the most commonly used regularization method for the regression analysis of ill-posed problem.
The previous model selection has said that the more complex the model parameters, such as the current Data feature x x dimension is very high, even if I use linear regression still have a lot of parameters need to train, this will cause a certain degree of fitting. And the resulting model is not highly explanatory. It is time to consider introducing lasso regression.
The ridge regression is a kind of biased estimation regression method for collinearity data analysis, which is essentially an improved least squares estimation method, which gives up the unbiased of least squares, and obtains more reliable regression coefficients at the cost of partial information loss and precision reduction. A person to leave a little bottom line, return can be too extreme. 1 Basic Forms
The characteristic of lasso regression is that the variable selection (Variable Selection) can be performed while fitting the training data. So what kind of mechanism does it choose? The answer is: regularization (regularization). Or you can simply call this thing a punitive term.
A brief review of the loss function of linear regression: L (w) =1n∑i=1n (Yi−f (xi)) 2=1n| | y−xw| | 2 L (W) = \frac{1}{n} \sum\limits_{i = 1}^{n} (y_i-f (x_i)) ^2 = \frac{1}{n}| | y-xw| | ^2
The analytic solution can be obtained: w∗= (xtx) −1xty w^* = (X^TX) ^{-1}x^ty
So when the x x input space dimension is very large, there may be a problem of fitting. So here we introduce the regular term, which is to make some restrictions on w W. So our optimization problem from the original minw12| | y−xw| | 2 \mathop{min}\limits_{w}\frac{1}{2}| | y-xw| | ^2 became: minw12| | y−xw| | 2,s.t.| | w| | 1<θ\mathop{min}\limits_{w}\frac{1}{2}| | y-xw| | ^2, s.t.| | w| | _1
The Ridge regression also controls the model coefficients by adding a regular term. Its optimization problem is expressed as:
Minw