The principle of Ridge regression:
First, we need to understand the regression principle of least squares.
With multiple linear regression model y=xβ+ε, the least squares estimation of the parameter β is
When there is a multiplicity of collinearity between the arguments, | X ' x|≈0, imagine | X ' x| to add a normal number matrix (k>0)
So | The degree to which X ' X|+ki is close to singularity is much smaller than that of proximity to singularity. Considering the dimensional problem of variables,
To standardize the data first, the standardized design matrix is still represented by X, defined as the ridge regression estimate, where
K is called the Ridge parameter. Since it is assumed that X is standardized, it is an independent variable sample-related array. Y can be standardized or not standardized,
If Y is also standardized, the actual calculation is the normalized ridge regression estimate. (k) The estimate of β should be more stable than the least squares estimate, and the ridge regression estimate when k=0 is the common least squares estimate .
Because the ridge parameter K is not the only certainty, the resulting ridge regression estimate is actually an estimated family of regression parameters.
The parameters of the ridge regression are estimated to be
Python code for the ridge regression:
Use the Scikit-learn module in Python primarily
# Ridge regression (Ridge regression) from sklearn import linear_modelx = [[0, 0], [1, 1], [2, 2]]y = [0, 1, 2]CLF = Linear_model. Ridge (alpha=0.1) # set K value Clf.fit (X, y) # parameter fit print (CLF.COEF_) # Factor print (Clf.intercept_) # constant Print ( Clf.predict ([[[3, 3]]) # Find the predicted value print (Clf.decision_function (x)) # for prediction, equivalent to Predictprint (Clf.score (x, y)) # r^ 2, goodness of Fit print (Clf.get_params ()) # get parameter information print (Clf.set_params (fit_intercept=false)) # Reset Parameters
Later detailed analysis application can see the usage of Linear_model
The principle and application of Python Ridge regression (ridge regression)