Introduction
Linear and logistic regression are often the first modeling algorithms that people learn for machine learning and data science. Both are great because they are easy to use and explain. However, their inherent simplicity has some drawbacks, and in many cases they are not the best choice for regression models. There are actually several different types of regression, each with its own strengths and weaknesses.
In this article, we will discuss 5 of the most common regression algorithms and their properties, while evaluating their performance. Finally, we want to give you a more comprehensive understanding of the regression model!
Directory
- Linear regression
- Polynomial regression
- Ridge return
- Lasso regression
- Resilient Network regression
Linear regression (Linear Regression)
Regression is a technique used to model and analyze the relationships between variables, usually how they are combined and related to producing specific results together. Linear regression refers to a regression model composed entirely of linear variables. Starting with a simple case, univariate linear regression is a technique used to model the relationship between a single input independent variable and an output dependent variable using linear models.
A more general case is multivariate linear regression, in which a model is created for the relationships between multiple independent input variables (feature variables) and output dependent variables. The model remains linear, and the output is a linear combination of input variables. We can model multivariable linear regression as follows:
Where A_n is the coefficient, x_n is the variable and B is the deviation. As we can see, this function does not contain any nonlinearity, so it is only suitable for modeling linear separable data, because we only use the coefficient weight a_n to weigh the importance of each feature variable x_n.
Several key points about linear regression:
- Modeling is quick and easy and is especially useful when modeling relationships that are not very complex and do not have large amounts of data.
- Very intuitive to understand and explain.
- Linear regression is very sensitive to outlier values.
Python instance:
Import NumPy as Npimport pandas as Pdfrom sklearn import datasetsfrom sklearn import Metricsdata=datasets.load_boston () # L Oad data# Definition evaluation Function def evaluation (y_true,y_pred,index_name=[' OLS '): df=pd. DataFrame (index=[index_name],columns=[' mean absolute error ', ' mean squared error ', ' R2 ') ' df[' mean absolute error ']=metrics.mean_absolute_error (y_ True, y_pred). Round (4) df[' mean square error ']=metrics.mean_squared_error (y_true,y_pred) df[' R2 ']=metrics.r2_score (y _true,y_pred) return DF
DF=PD. DataFrame (data.data,columns=data.feature_names) target=pd. DataFrame (data.target,columns=[' MEDV ')
Simple visual Analysis:
Import Matplotlib.pyplot as Pltimport Seaborn as Snssns.set (style= "Whitegrid", Color_codes=true) G=sns.pairplot (data[ List (Data.columns) [: 5]], hue= ' ZN ', palette= "HUSL", diag_kind= "hist", size=2.5) for ax in G.axes.flat: PLT.SETP ( Ax.get_xticklabels (), rotation=45) Plt.tight_layout ()
Correlation coefficient graph of features:
CM = Np.corrcoef (data[list (data.columns) [: 5]].values. T) #corrcoef方法按行计算皮尔逊相关系数, CM is a symmetric matrix # using NP.CORRCOEF (a) to calculate the correlation coefficients between rows and lines, Np.corrcoef (a,rowvar=0) is used to calculate the correlation coefficients between columns. The output is the correlation coefficient matrix. Sns.set (font_scale=1.5) #font_scale设置字体大小cols =list (data.columns) [: 5]hm = Sns.heatmap (cm,cbar=true,annot=true , square=true,fmt= '. 2f ', annot_kws={' size ': 15},yticklabels=cols,xticklabels=cols) # plt.tight_layout () # Plt.savefig ('./figures/corr_mat.png ', dpi=300)
OLS with the Statsmodels module
Import Statsmodels.api as smx=df[df.columns].valuesy=target[' MEDV '].values#add constantx=sm.add_constant (X) # Build Modelmodel=sm. OLS (y,x). Fit () prediction=model.predict (X) print (Model.summary ())
You can also use the Sklearn module:
From Sklearn import linear_modellm = Linear_model. Linearregression () model = Lm.fit (x, y) y_pred = lm.predict (×) lm.score (y) #系数lm. coef_# Intercept Lm.intercept_
Evaluation (y,y_pred)
Polynomial regression (polynomial Regression)
We need to use polynomial regression when we want to create a model that is suitable for dealing with non-linear data that can be divided. In this regression technique, the best fit line is not a straight line. It is a curve that fits the data points. For polynomial regression, the power of some independent variables is greater than 1. For example:
We can let some variables have exponents, other variables have no exponents, and we also choose the exact exponent we want for each variable. However, choosing the exact exponent of each variable naturally requires understanding how the data relates to the output.
Note
- Ability to model nonlinear separable data; Linear regression does not do this. It is usually more flexible and can create some rather complex relationships.
- Fully control the modeling of feature variables (specify to set).
- Need to be carefully designed. Some data knowledge is required to select the best index.
- if the index is not selected properly, it is easy to fit excessively .
Python instance:
From sklearn.preprocessing Import Polynomialfeaturespoly_reg = polynomialfeatures (degree = 4) X_poly = Poly_reg.fit_ Transform (X) lin_reg_2 =linear_model. Linearregression () Lin_reg_2.fit (X_poly, y) y_pred=lin_reg_2.predict (Poly_reg.fit_transform (X)) Evaluation (Y,y_ pred,index_name=[' Poly_reg ')
As can be seen, the error is very small, R2 is very large, the model has been fitted.
Ridge return (Ride regression)
In the case of high collinearity between feature variables, standard linear or polynomial regression will fail. Collinearity is a near-linear relationship between independent variables. The existence of high collinearity can be determined in several different ways:
- Even if the variable should theoretically be related to the y height, the regression coefficients are not significant.
- When you add or remove an X feature variable, the regression coefficients change significantly.
- The x feature variable has gaocheng correlation (check correlation matrix).
We can first look at the optimization functions of standard linear regression to get some insight into how Ridge regression helps:
where x represents the characteristic variable, w represents the weight, and y represents the actual value. Ridge regression is a remedial measure to mitigate the collinearity between predictor variables in a regression model. Because of the collinearity of the feature variables, the final regression model has a high variance.
To alleviate this problem, Ridge regression adds a small squared-deviation factor to the variable:
A small amount of deviation is introduced into the model, but the variance is greatly reduced.
Note
- The regression hypothesis is similar to the least-squares regression, but there is no normality hypothesis.
- It will reduce the value of the coefficient, but will not reach 0, which indicates that no feature selection function
Python instance:
From Sklearn.linear_model Import Ridgeridge_reg = Ridge (alpha=1, solver= "Cholesky") Ridge_reg.fit (X, y) y_pred=ridge_ Reg.predict (xevaluation (y,y_pred,index_name= ' Ridge_reg ')
Lasso regression (Lasso regression)
The lasso regression is very similar to the ridge regression because both technologies have the same premise. We again add a bias to the regression optimization function to reduce the effect of collinearity, thereby reducing the variance of the model. However, instead of using squared deviations like ridge regression, the lasso regression uses absolute deviations:
There are some differences between ridge and lasso regression, which can basically revert to the attribute differences of L2 and L1 regularization:
- Built-in feature selection: Often referred to as a useful attribute of the L1 norm, while the L2 norm is not. This is actually the result of the L1 norm, which tends to produce sparse coefficients. For example, suppose that the model has 100 coefficients, but only 10 coefficients have a non-0 factor, which in effect means "the other 90 predictors are useless in predicting the target value". The L2 norm produces a non-sparse coefficient and therefore does not have this attribute. Therefore, it can be said that lasso regression does a "parameter selection" because the total weight of the unchecked feature variable is 0.
- Sparsity: A very small number of entries in a matrix (or vector) are nonzero. The L1 norm has properties that produce many coefficients with a value of 0 or very small values with very few large coefficients. This is associated with the previous point that Lasso performs a feature selection.
- Computational efficiency: L1 Norm has no analytic solution, but L2 has. The L2 norm solution can be calculated efficiently. However, the L1 norm has a sparse attribute that allows it to be used with sparse algorithms, which makes the calculation more efficient.
From Sklearn.linear_model Import Lassolasso_reg = Lasso (alpha=0.1) lasso_reg.fit (x, y) y_pred=lasso_reg.predict (x) Evaluation (y,y_pred,index_name= ' Lasso_reg ')
Resilient network regression (elasticnet Regression)
Elasticnet is a hybrid of lasso and ridge regression technology. It uses L1 and L2 regularization to consider the effects of two technologies:
One practical advantage of the tradeoff between Lasso and Ridge is that it allows elastic-net to inherit some of the stability of ridge under rotation.
Note
- It encourages group effects in the case of highly correlated variables, rather than zeroing in some of them like lasso.
- There is no limit to the number of selected variables.
Python instance:
Enet_reg = Linear_model. Elasticnet (l1_ratio=0.7) enet_reg.fit (x, y) y_pred=enet_reg.predict (×) evaluation (y,y_pred,index_name= ' Enet_reg ')
Summary:
This article briefly summarizes the 5 common types of regression and their properties. All of these regression regularization methods (Lasso,ridge and elasticnet) work well in the case of high dimensions and multiple collinearity between variables in the dataset. Hope to help everyone!