Sklearn-logisticregression logical Regression

Source: Internet
Author: User
Logical regression:

It can be used for probability prediction and classification, and can be used only for linear problems. by calculating the probability of the real value and the predicted value, and then transforming into the loss function, the minimum value of the loss function is calculated to calculate the model parameters, and then the model is obtained.

Sklearn.linear_model. Logisticregression Official API:

Official api:http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.logisticregression.html

Class Sklearn.linear_model. Logisticregression (penalty= ' L2 ', Dual=false, tol=0.0001, C=1.0,fit_intercept=true, Intercept_scaling=1, Class_ Weight=none, random_state=none,solver= ' liblinear ', max_iter=100, multi_class= ' OVR ', Verbose=0,warm_start=false, N_ Jobs=1)

Parameter interpretation regularization selection parameter (type of penalty item)

Penalty:str, ' L1 ' or ' L2 ', default: ' L2 '

Usedto Specify the norm used in the penalization. The ' NEWTON-CG ', ' sag ' and ' LBFGS ' solvers support only L2 penalties. Logisticregression default with a regularization item. The value that the penalty parameter can select is "L1" and "L2". corresponding to L1 regularization and L2 regularization, the default is the regularization of L2. If our main purpose is to solve the fitting, the General Penalty selection L2 is enough . However, if the selection of L2 is still a fitting, that is, when the prediction effect is poor, L1 regularization can be considered. In addition, if the model features very much, we hope that some of the less important feature coefficients to zero, so that the model coefficients are sparse, you can also use L1 regularization. The selection of penalty parameters will affect the selection of the optimization algorithm of loss function. That is, the choice of parameter solver, if it is L2 regularization, then 4 kinds of optional algorithms {' NEWTON-CG ', ' lbfgs ', ' liblinear ', ' sag '} can be selected. But if penalty is L1, then you can only choose ' liblinear '. This is because the loss function of L1 regularization is not continuous, and the three optimization algorithms {' NEWTON-CG ', ' lbfgs ', ' SAG '} require the first or second successive derivative of the loss function. and ' Liblinear ' doesn't have that dependency. dual : bool, default:false

Dualor Primal Formulation. Dual formulation is only implemented for L2 penalty withliblinear. Prefer Dual=false whenn_samples > N_features. Dual or Original method. Dual only applies present to L2 Liblinear, the default is False if the number of samples is greater than the number of features. C : float, default:1.0

Inverseof regularization strength; Must be a positive float. Like in support vectormachines, smaller values specify stronger regularization. C is the reciprocal of the regularization coefficient λ, usually default to 1 fit_intercept : bool, default:true

Specifiesif a constant (a.k.a. bias or intercept) should is added to the decisionfunction. There is a intercept, default exists intercept_scaling : float, 1.

Usefulonly the Solver ' Liblinear ' is used and self.fit_intercept are set to true.in this case, x becomes [x, Self.inte Rcept_scaling], i.e. a "synthetic" Featurewith constant value equal to intercept_scaling are appended to the Instancevector . The Intercept becomes intercept_scaling * synthetic_feature_weight.

Note!the synthetic feature weight is subject to l1/l2 regularization as all otherfeatures. To lessen the effect of regularization on synthetic feature weight (and therefore on The Intercept) intercept_scaling has T o be increased.

Useful only if the regularization item is "Liblinear" and the fit_intercept is set to true. optimization Algorithm selection parameters

Solver

{' NEWTON-CG ', ' lbfgs ', ' liblinear ', ' sag '}, default: ' Liblinear '

Algorithmto use the optimization problem.

Forsmall datasets, ' liblinear ' is a good choice, whereas ' sag ' is

Fasterfor large ones.

Formulticlass problems, only ' NEWTON-CG ', ' sag ' and ' LBFGS ' handle

Multinomialloss; ' Liblinear ' is limited to one-versus-rest schemes.

' NEWTON-CG ', ' lbfgs ' and ' sag ' only handle L2 penalty.

Notethat ' sag ' fast convergence is only guaranteed on features with approximatelythe same. You can preprocess the data with a scaler fromsklearn.preprocessing.

Newin version 0.17:stochastic Average gradient descent solver.

The solver parameter determines the optimization method of the logistic regression loss function, there are four kinds of algorithms that can be chosen, respectively:

A) Liblinear: using the Open source Liblinear Library implementation, the internal use of the axis descent method to iterate the optimization loss function.

b) Lbfgs: A quasi-Newton method, which uses the second derivative matrix of the loss function, i.e. the Haisen matrix, to iteratively optimize the loss function.

c) NEWTON-CG: It is also a kind of Newton family, using the second derivative matrix of loss function, namely the Haisen matrix, to iteratively optimize the loss function.

D) SAG: that is, random average gradient descent, is a variant of gradient descent method, and the common gradient descent method is only a part of the sample to calculate the gradient, suitable for the sample data more time. As can be seen from the above description, NEWTON-CG, LBFGS and sag all of these three optimization algorithms require a loss function of the first or second-order continuous derivative, so can not be used without continuous derivative L1 regularization, can only be used for L2 regularization. and Liblinear-taking L1 regularization and L2 regularization. At the same time, SAG only use some of the sample gradient iteration, so when the sample size is not selected, and if the sample size is very large, such as greater than 100,000, SAG is the first choice. But SAG can not be used for L1 regularization, so when you have a large number of samples, but also need to L1 regularization of their own choice. Either by sampling the sample to reduce the sample size, or back to L2 regularization. From the above description, we may feel that, since NEWTON-CG, LBFGS and sag so many restrictions, if not a large sample, we choose Liblinear not on the line. Wrong, because Liblinear also has its own weakness. We know that logistic regression has two-yuan logistic regression and multivariate logistic regression. There are two kinds of one-vs-rest (OvR) and Many-vs-many (MvM) that are common to multivariate logistic regression. And MVM is generally more accurate than OVR classification. Depressed is liblinear only support OVR, do not support MVM, so if we need a relatively accurate multivariate logic regression, we can not choose Liblinear. It also means that if we need a relatively accurate multivariate logical regression, we cannot use L1 regularization. summarizes the application of several optimization algorithms:

L1

Liblinear

Liblinear is suitable for small datasets, and if you choose L2 regularization, you will find that it is fitting. That is, when the prediction effect is poor, L1 regularization can be considered, and if the model has many characteristics, it is hoped that some unimportant characteristic coefficients will be zeroed, so that the model coefficients can be sparse, and the L1 regularization may also be used.

L2

Liblinear

Libniear only supports the OVR of multivariate logistic regression, and does not support MVM, but MVM is relatively accurate.

L2

Lbfgs/newton-cg/sag

Large datasets that support both One-vs-rest (OvR) and Many-vs-many (MvM) multiple logistic regressions.

L2

Sag

If the sample size is very large, such as greater than 100,000, SAG is the first choice, but cannot be used for L1 regularization.

Specific OVR and MVM are different in the next section.

Select the parameters in the classification mode :

Multi_class:str, {' OVR ', ' multinomial '}, Default: ' OVR '

Multiclassoption can be either ' OVR ' or ' multinomial '. If the option chosen is ' OVR ', then a binary problem be fit for each label. Else The loss minimised is themultinomial loss fit across the entire probability. Works only forthe ' NEWTON-CG ', ' sag ' and ' Lbfgs ' solver.

Newin version 0.18:stochastic Average gradient descent for ' solver ' case. OVR is the One-vs-rest (OVR) mentioned earlier, and multinomial is the Many-vs-many (MvM) mentioned earlier. If it is two Yuan logical regression, OVR and multinomial are not any different, the difference is mainly in the multivariate logical regression. OvR and MvM What's the difference? .

OVR's idea is very simple, no matter how much you are logical regression, we can be regarded as two yuan logical regression. The specific approach is that for category K classification decision, we take all the K class samples as positive examples, except for the K-class samples of all samples are negative, and then do the above two-yuan logical regression, the class K classification model. The classification model for other classes is obtained by analogy.

And MVM is relatively complex, here MVM Special case One-vs-one (OvO) for explanation. If the model has a T class, every time we select two kinds of samples in all T-class samples, it can be recorded as T1 class and T2 class, all the output for the T1 and T2 samples together, the T1 as a positive example, T2 as a negative example, for two-yuan logistic regression, get the model parameters. We need T (T-1)/2 classification altogether.

It can be seen that OVR is relatively simple, but the classification effect is relatively slight (this refers to the distribution of most samples, some samples may be better under the distribution of OVR). and MVM classification is relatively accurate, but the classification speed is not OVR fast. If OVR is selected, the optimization methods of 4 loss functions LIBLINEAR,NEWTON-CG,LBFGS and SAG can be selected. But if you choose Multinomial, you can only choose NEWTON-CG, LBFGS and sag. Type weighting parameters: (Consider the problem of unbalanced classification cost sensitivity and classification type imbalance)

Class_weight:dictor ' balanced ', default:none

Weightsassociated with classes in the form {class_label:weight}. If not given, allclasses are supposed to have one.

The "balanced" mode uses the values of Y to automatically adjust weights inverselyproportional to class frequencies in the Input data as N_samples/(n_classes *np.bincount (y)).

Notethat These weights'll be multiplied with sample_weight (passed-through method) if Thefit is sample_weight.

Newin version 0.17:class_weight= ' balanced ' instead of deprecatedclass_weight= ' auto '. The Class_weight parameter is used to indicate the weights of various types in the classification model, without input, that is, regardless of weight, or all types of weights. If you choose to input, you can choose balanced let the class library to calculate the type weight, or we enter the weight of each type, such as for 0,1 two-yuan model, we can define class_weight={0:0.9, 1:0.1}, so that type 0 weight of 90%, And the weight of type 1 is 10%. If Class_weight chooses balanced, the class library calculates the weights based on the number of training samples. The more a type of sample, the lower the weight, the less the sample size, the higher the weight. When Class_weight is balanced, the method of calculating the class weights is as follows: N_samples/(N_classes * Np.bincount (y))

N_samples is the number of samples, n_classes is the number of categories, np.bincount (y) outputs the number of samples per class, such as y=[1,0,0,1,1], then Np.bincount (y) =[2,3] What effect does Class_weight have?

In the classification model, we often encounter two types of problems:

The first is the high cost of classifying by mistake . For example, to classify legitimate users and illegal users, the cost of classifying illegal users as legitimate users is very high, we prefer to classify legitimate users as illegal users, then can be manually screened, but do not want to classify illegal users as legitimate users. At this point, we can appropriately increase the weight of illegal users.

The second is that the sample is highly unbalanced , such as we have legitimate users and illegal users of the two-dollar sample data 10,000, the legal user has 9,995, the illegal user only 5, if we do not consider the weight, then we can all test sets are predicted as legitimate users, So the prediction accuracy rate is 99.95% theoretically, but it doesn't make any sense. At this time, we can choose balanced, let class library automatically increase the weight of illegal user samples.

Increase the weight of a certain classification, compared to the weight, there will be more sample classification to the high weight of the category, which can solve the above two types of problems.

Of course, for the second sample imbalance, we can also consider using the sample weight parameters mentioned in the next section: Sample_weight, instead of using Class_weight. Sample_weight

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.