Turn from: http://blog.csdn.net/slade_sha/article/details/53164905
First look at a wave of fitting:
In the figure, the red line has obvious cross fitting, the Green Line is the reasonable fitting curve, in order to avoid the fitting, we can introduce regularization.
The following can be used to solve the curve fitting in the process of √[∑di^2/n]=re,n, the existence of root mean square error is also called standard error, that is, for the measurement of the number of measurements; Di is a set of measured values and true value deviation.
In actual consideration of the regression process, we need to consider the error term,
This is similar to the simple linear regression formula, and when it is tuned to optimize the fitting, a constraint is added, which is the penalty function:
This penalty function here has a variety of forms, more commonly used l1,l2, there are probably the following:
Talk about the two more commonly used situations, q=1 and q=2:
Q=1, which is what we want to talk about today, lasso regression, why Lasso can control the fit, because in the process of data training, there may be hundreds of, or thousands of variables, and then too many variables to measure the objective function of the variable, may result in the excessive interpretation, And through the q=1 penalty function to limit the number of variables, you can first filter out some not particularly important variables, see the following figure:
Drawing as long as it is not a special case with the edge of the square tangent, must be with a vertex priority intersection, there must be a horizontal axis in a coefficient of 0, play a role in the selection of variables.
When the q=2, in fact, can be seen above this blue circle, the limit of this circle, the point can be any point on the circle, so q=2 is also called Ridge regression, Ridge regression is not the role of the compression variable, in this diagram can be seen.
Lasso regression:
The characteristic of lasso regression is that when the generalized linear model is established, in addition, the generalized linear model includes one dimension continuous variable, multidimensional continuous dependent variable, nonnegative number dependent variable, two discrete dependent variable, and multivariate discrete variation, besides, whether the dependent variable is continuous or discrete, the lasso can be processed, in general, Lasso is extremely low on data requirements, so it is widely used, and in addition, lasso can filter and reduce the complexity of the model. The variable selection here means that all variables are not fitted into the model, but are selectively put into the model to get better performance parameters. The complexity adjustment is to control the complexity of the model through a series of parameters, so as to avoid excessive fitting (Overfitting). For a linear model, the complexity is directly related to the number of variables in the model, the more the number of variables, the higher the complexity of the model. More variables can often give a seemingly better model when fitted, but at the same time they are at risk of being overly fitted.
The complexity of lasso is controlled by lambda, and the greater the penalty for the linear model with more variables, the more the model of a variable is obtained. In addition, another parameter α controls the behavior of the model in response to high correlation (highly correlated) data. Lasso regression Α=1,ridge regression α=0, which corresponds to the form and purpose of the penalty function. We can select the parameters under the optimal λ by trying several times of λ under different values, and we can choose the best model with the CV.
# #读取数据
SETWD ("~/desktop")
Library (glmnet)
Train_origin<-read.table (' trian.txt ', header = T,fill = T)
Test_origin<-read.table (' test.txt ', header = T,fill = T)
Train_test1<-train_origin
TRAIN_TEST1<-TRAIN_TEST1[,-9]
Train_test1$tag<-as.factor (Train_test1$tag)
Train_test1$risk_level<-as.factor (Train_test1$risk_level)
X<-TRAIN_TEST1[,3:11]
y<-train_test1[,2]
# # One hot encoding
X1<-model.matrix (~., x)
There are usually discrete points in the data, and the Lasso in R is entered by numerical matrices, so you need to do a one-step preprocessing of the original data, or this side will throw errors; In addition, if the difference between the number of data levels larger, but also need to standardize, r inside can also be processed, this side will not repeat, Glmnet () function to add a parameter standardize = True to implement, the scale () function can also be implemented, the choice can be.
# #模型训练
Model = glmnet (x1, y, family= "binomial", nlambda=50, Alpha=1)
Family inside refers to the type of the selection function:
Family |
Explation |
Gaussian |
Univariate |
Mgaussian |
Multivariate |
Poisson |
Count |
Binomial |
Binary |
Multinomial |
Category |
A lambda is a random selection of λ, a lambda model; Alpha is the aforementioned α, the penalty function, normally, 1 is lasso,0 is the ridge regression
The model extension here can be cross-referenced, with built-in functions:
Cvmodel = cv.glmnet (x1, y, family = "binomial", Type.measure = "Class", nfolds=10)
There's going to be one more type.measure, this type.measure is what the objective parameter is expected to minimize, in other words, what is the metric function that measures this model: