Original source: http://www.cnblogs.com/pinard/p/6035872.html, on the basis of the original made a number of amendmentsThe Logisticregression API in Sklearn is as follows, official documentation: Http://scikit-learn.org/stable/modules/generated/sklearn.linear_model. Linearregression.html#sklearn.linear_model. Linearregression
Class Sklearn.linear_model. Logisticregression (penalty= ' L2 ', Dual=false, tol=0.0001, c=1.0, Fit_intercept=true, Intercept_scaling=1, Class_ Weight=none, Random_state=none, solver= ' liblinear ', max_iter=100, multi_class= ' OVR ', verbose=0, Warm_start=False, N_ Jobs=1)
1. Overview
In Scikit-learn, these 3 classes are mainly related to logistic regression. Logisticregression, Logisticregressioncv and Logistic_regression_path. The main difference between logisticregression and LOGISTICREGRESSIONCV is that the LOGISTICREGRESSIONCV uses Cross-validation to select the regularization coefficient c. and logisticregression needs to specify a regularization coefficient each time. In addition to cross-validation and selecting the regularization factor C, logisticregression and LOGISTICREGRESSIONCV are used in the same way.
Logistic_regression_path class is very special, it can not be directly predicted after fitting data, only the appropriate logical regression coefficients and regularization coefficients can be selected for fitting data. It is mainly used when the model is chosen. This class is not used in general, so the Logistic_regression_path class is not described later.
In addition, there is an easily misunderstood class randomizedlogisticregression, although the name has a logical return of the word, but mainly with L1 logical regression to do feature selection, belong to the dimension of the Scikit-learn algorithm class, Does not belong to the category of classification algorithm that we often say.
The following explanations focus on the selection of important parameters in Logisticregression and LOGISTICREGRESSIONCV, which are the same in all two classes. 2. Regularization selection parameter: Penalty
Logisticregression and LOGISTICREGRESSIONCV default to bring the regularization item. The value that the penalty parameter can select is "L1" and "L2". corresponding to L1 regularization and L2 regularization, the default is the regularization of L2.
If our main purpose is to solve the fitting, the general penalty selection L2 is enough. However, if the selection of L2 is still a fitting, that is, when the prediction effect is poor, L1 regularization can be considered. In addition, if the model features very much, we hope that some of the less important feature coefficients to zero, so that the model coefficients are sparse, you can also use L1 regularization.
The selection of penalty parameters will affect the selection of the optimization algorithm of loss function. That is, the choice of parameter solver, if it is L2 regularization, then 4 kinds of optional algorithms {' NEWTON-CG ', ' lbfgs ', ' liblinear ', ' sag '} can be selected. But if penalty is L1, then you can only choose ' liblinear '. This is because the loss function of L1 regularization is not continuous, and the three optimization algorithms {' NEWTON-CG ', ' lbfgs ', ' SAG '} require the first or second successive derivative of the loss function. and ' Liblinear ' doesn't have that dependency.
The specific use of these 4 algorithms is different and what is the impact of our next section. 3. Optimization algorithm selection parameters: Solver
The solver parameter determines the optimization method of the logistic regression loss function, there are 4 kinds of algorithms that can be chosen, respectively:
A) Liblinear: using the Open source Liblinear Library implementation, the internal use of the axis descent method to iterate the optimization loss function.
b) Lbfgs: A quasi-Newton method, which uses the second derivative matrix of the loss function, i.e. the Haisen matrix, to iteratively optimize the loss function.
c) NEWTON-CG: It is also a kind of Newton family, using the second derivative matrix of loss function, namely the Haisen matrix, to iteratively optimize the loss function.
D SAG: That is, random average gradient descent is a variant of the gradient descent method, and the common gradient descent method is the difference is only a part of the sample to calculate the gradient, suitable for more than the sample data, SAG is a linear convergence algorithm, this speed is much faster than the SGD. On the understanding of SAG, reference Bowen the stochastic optimization algorithm for linear convergence SAG, SVRG (random gradient descent)
As can be seen from the above description, NEWTON-CG, LBFGS and sag all of these three optimization algorithms require a loss function of the first or second-order continuous derivative, so can not be used without continuous derivative L1 regularization, can only be used for L2 regularization. and Liblinear-taking L1 regularization and L2 regularization.
At the same time, SAG only use some of the sample gradient iteration, so when the sample size is not selected, and if the sample size is very large, such as greater than 100,000, SAG is the first choice. But SAG can not be used for L1 regularization, so when you have a large number of samples, but also need to L1 regularization of their own choice. Either by sampling the sample to reduce the sample size, or back to L2 regularization.
In the official documentation for Sklearn, the use of Solver is described below:
In a nutshell, one may choose the solver with the following rules:
| Case
Solver |
Small DataSet or L1 penalty |
"Liblinear" |
Multinomial loss or large dataset |
"Lbfgs", "sag" or "NEWTON-CG" |
Very Large DataSet |
"Sag"
|
From the above description, we may feel that, since NEWTON-CG, LBFGS and sag so many restrictions, if not a large sample, we choose Liblinear not on the line. Wrong, because Liblinear also has its own weakness. We know that logistic regression has two-yuan logistic regression and multivariate logistic regression. There are two kinds of one-vs-rest (OvR) and Many-vs-many (MvM) that are common to multivariate logistic regression. And MVM is generally more accurate than OVR classification. Depressed is liblinear only support OVR, do not support MVM, so if we need a relatively accurate multivariate logic regression, we can not choose Liblinear. It also means that if we need a relatively accurate multivariate logical regression, we cannot use L1 regularization.
In summary, Liblinear supports L1 and L2, supports only OVR multiple classifications, "LBFGS", "SAG" "NEWTON-CG" only supports L2, and supports OVR and MVM long categories.
What is the difference between concrete ovr and MVM? We'll talk about it in the next section. 4. Classification Method selection Parameters: Multi_class
Multi_class parameters determine the choice of our classification, there are OVR and multinomial two values can be selected, the default is OVR.
OvR is the One-vs-rest (OvR) mentioned earlier, and Multinomial is the Many-vs-many (MvM) mentioned earlier. If it is two Yuan logical regression, OVR and multinomial are not any different, the difference is mainly in the multivariate logical regression.
OVR's idea is very simple, no matter how much you are logical regression, we can be regarded as two yuan logical regression. The specific approach is that for category K classification decision, we take all the K class samples as positive examples, except for the K-class samples of all samples are negative, and then do the above two-yuan logical regression, the class K classification model. The classification model for other classes is obtained by analogy.
And MVM is relatively complex, here MVM Special case One-vs-one (OvO) for explanation. If the model has a T class, every time we select two kinds of samples in all T-class samples, it can be recorded as T1 class and T2 class, all the output for the T1 and T2 samples together, the T1 as a positive example, T2 as a negative example, for two-yuan logistic regression, get the model parameters. We need T (T-1)/2 classification altogether.
It can be seen from the above description that OVR is relatively simple, but the classification effect is relatively slight (this refers to the distribution of most samples, some samples may be better under the OVR). and MVM classification is relatively accurate, but the classification speed is not OVR fast.
If OVR is selected, the optimization methods of 4 loss functions are LIBLINEAR,NEWTON-CG, LBFGS and SAG can be selected. But if you choose Multinomial, you can only choose NEWTON-CG, LBFGS and sag. 5. Type weight parameter: Class_weight
The Class_weight parameter is used to indicate the weights of various types in the classification model, without input, that is, regardless of weight, or all types of weights. If you choose to input, you can choose balanced let the class library to calculate the type weight, or we enter the weight of each type, such as for 0,1 two-yuan model, we can define class_weight={0:0.9, 1:0.1}, so that type 0 weight of 90%, And the weight of type 1 is 10%.
If Class_weight chooses balanced, the class library calculates the weights based on the number of training samples. The more a type of sample, the lower the weight, the less the sample size, the higher the weight.
Sklearn's Official document, when Class_weight is balanced, the class weight method is calculated as follows:
N_samples/(N_classes * Np.bincount (y)), n_samples to the number of samples, n_classes to the category number, Np.bincount (y) will output the number of samples per class, such as y=[1,0,0,1,1], Then Np.bincount (y) =[2,3]
So what's the effect of class_weight? In the classification model, we often encounter two types of problems:
The first is the high cost of classifying by mistake. For example, to classify legitimate users and illegal users, the cost of classifying illegal users as legitimate users is very high, we prefer to classify legitimate users as illegal users, then can be manually screened, but do not want to classify illegal users as legitimate users. At this point, we can appropriately increase the weight of illegal users.
The second is that the sample is highly unbalanced, for example, we have legitimate users and illegal users of the two-dollar sample data 10,000, the legitimate users have 9,995, illegal users only 5, if we do not consider the weight, then we can all the test sets are predicted to be legitimate users, so the prediction accuracy rate in theory has 99.95 %, but it doesn't make any sense. At this time, we can choose balanced, let class library automatically increase the weight of illegal user samples.
Increase the weight of a certain classification, compared to the weight, there will be more sample classification to the high weight of the category, which can solve the above two types of problems.
Of course, for the second sample imbalance, we can also consider using the sample weight parameters mentioned in the next section: Sample_weight, instead of using Class_weight. Sample_weight in the next section. 6. Sample Weight parameter: Sample_weight
In the previous section we mentioned the problem of sample imbalance, because the sample imbalance, resulting in the sample is not an unbiased estimate of the overall sample, which may lead to our model prediction ability to decline. In this case, we can try to solve this problem by adjusting the weight of the sample. There are two ways to adjust sample weights, the first of which is to use balanced in class_weight. The second is to adjust each of the sample weights by Sample_weight when the Fit function is invoked.
In Scikit-learn do logical regression, if the above two methods are used, then the real weight of the sample is Class_weight*sample_weight.
The above is a summary of the logical regression class library Scikit-learn, and some parameters such as regularization parameter C (Cross-validation is Cs), iterative number of max_iter, etc., because and other algorithms class library is not particularly different, here is not much tired.