The following is a set of methods for regression in which the target value is a linear combination of input variables. Used as a predictive value.
Through the module, we specify the vector as coef_ (coefficient), which is intercept_ (intercept).
To implement classification using generalized linear models, see logistic regression.
1.1.1. General least squares
Linear regression fitting takes the coefficients to minimize the response of the observable data to the sum of the residuals and squares of the response of the linear model, which is represented by a mathematical formula:
Linearregression use the Fit method for array x, Y, and store the coefficients of the results in Coef_:
from Import Linear_model>>> reg = Linear_model. Linearregression ()>>> reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) linearregression (copy_x=true, F It_intercept=true, N_jobs=1, normalize=False)>>>0.5, 0.5])
However, the estimation of the general least squares coefficients depends on the independent (data points) of the model items. Property? ), when the item is related, and the column of Matrix X is approximately linearly related, the matrix becomes approximately related, so the result is that the smallest estimate becomes very sensitive to the random anomaly points in the observations, thus creating a huge deviation. For example, this multi-collinearity situation occurs frequently in experimental data.
Example
1.1.1.1. Complexity of general least squares
This method uses the single-value decomposition of x, if X is a matrix (n,p), then the consumption of this method is O (NP2), assuming n≥p.
1.1.2. Ridge regression (Ridge regression)
Ridge regression Some problems of general least squares are improved by imposing a penalty on the coefficients. The ridge coefficients are minimized by penalty-optimized squared residuals and are mathematically expressed as:
The α>0 here is a complex parameter for controlling shrinkage: The larger the alpha value, the greater the shrinkage, so the model coefficients are more robust to the collinearity problem. (α is a penalty, the larger the value, the greater the tolerance of the anomaly, and therefore the more robust the model).
As with other models, Ridge uses the Fit method for array x and Y, and stores the coefficients of the linear model in Coef_.
from Import Linear_model>>> reg = Linear_model. Ridge (alpha =. 5)>>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0,. 1, 1]) Ridge (Alpha=0.5, COPY_X=TR UE, Fit_intercept=true, max_iter=None, normalize=false, Random_state=none, solver='Auto ', tol=0.001)>>>0.34545455, 0.34545455])>> >0.13636 ...
Example
The ridge coefficient as a regularization function
Classifying text documents with sparse features
1.1.2.1 Ridge Complexity
The complexity of this method is the same as that of conventional least squares.
1.1.2.2 Regular flower parameter setting: Universal cross-validation
RIDGECV uses built-in cross-validation of the Alpha parameter to achieve Ridge regression. This object, in addition to the default set general cross-certification (GCV), and GRIDSEARCHCV the same way, GCV is a valid form of cross-validation:
from Import Linear_model>>> reg = Linear_model. RIDGECV (alphas=[0.1, 1.0, 10.0])>>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0,. 1, 1]) Ridgecv (alpha S=[0.1, 1.0, 10.0], Cv=none, fit_intercept=true, scoring=None, normalize=False)>> > Reg.alpha_ 0.1
Reference reading: "Notes on regularized Least squares", Rifkin & Lippert (Technical report, course slides).
1.1.3 Lasso
Lasso is a linear model for estimating sparse coefficients, because it tends to solve problems with fewer parameter values, which effectively reduces the number of interdependent variables under a given solution, so it works well in some scenarios. Therefore, lasso and its variants are the basis of the field of compression perception. Under certain conditions, it can recover a precise set of non-0 matrices (signal Restoration in communication technology).
Under mathematical expression, the model consists of a linear model based on L1prior as the regularization training. The functions to minimize are:
Lasso estimates that the least squares penalty is solved by adding an item, where α is a constant and is the l1norm (norm) of the parameter vector.
The Lasso class uses the coordinate descent algorithm to fit the coefficients. Least Angle regression is another algorithm:
from Import Linear_model>>> reg = Linear_model. Lasso (alpha = 0.1)>>> reg.fit ([[0, 0], [1, 1]], [0, 1]) Lasso (Alpha=0.1, Copy_x=true, Fit_ Intercept=true, max_iter=1000, normalize=false, Positive=false, Precompute=false, random_state= None, selection='cyclic', tol=0.0001, warm_start=False)> >> Reg.predict ([[1, 10.8])
For low-order tasks, the Lasso_path function is very useful, and the function calculates the coefficients along the full path of the possible values.
Example
- Lasso and Elastic Net for Sparse signals
- Compressive sensing:tomography reconstruction with L1 Prior (Lasso)
Note: Feature selection with Lasso
Lassoregression produces a sparse model, so it can be used for feature selection, see l1-based Feature Selection.
Note: Random sparse
For feature selection or sparse recovery, you can use randomized sparse models.
1.1.3.1 Setting regularization parameters
The Alpha parameter controls the sparse order of sparse evaluated coefficients.
1.1.3.1.1 using cross-validation
1.1. Generalized linear model