Call Python's sklearn to implement the logistic reression algorithm

Last Update:2015-01-21 Source: Internet

Author: User

Tags return tag import database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Call Python's sklearn to implement the logistic reression algorithm

First of all, how to implement, where the import database and class, method of the relationship, not very clear before, now know ...

From numpy Import * from sklearn.datasets import load_iris     # import datasets# load the Dataset:irisiris = Load_iris () Samples = Iris.data#print Samples target = iris.target # import the Logisticregressionfrom Sklearn.linear_model import Lo Gisticregression classifier = logisticregression ()  # Use class, parameters are all default Classifier.fit (samples, target)  # Training data to learn, No return value Required x = Classifier.predict ([5, 3, 5, 2.5])  # test data, category return tag print x #其实导入的是sklearn. linear_ Model of a class: Logisticregression, it has many methods # Common methods are Fit (training classification model), Predict (predictive test sample marker) #不过里面没有返回LR模型中学习到的权重向量w, feel this is a flaw

The above used

classifier = Logisticregression ()  # using classes, parameters are all default

Is the default, all parameters are default, in fact, we can set a lot of ourselves. This requires an official given parameter description, as follows:

Sklearn.linear_model. Logisticregression

class Sklearn.linear_model. logisticregression ( penalty= ' L2 ', dual=false, tol=0.0001, c=1.0, fit_intercept=true< /c8>, intercept_scaling=1, class_weight=none, random_state=none)

Logistic Regression (aka Logit, MaxEnt) classifier.

In the Multiclass case, the training algorithm uses a One-vs.-all (OvA) scheme, rather than the "true" multinomial LR.

This class implements L1 and L2 regularized logistic regression using the liblinear Library. It can handle both dense and sparse input. Use c-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; Any other input format would be converted (and copied).

Parameters:	penalty : string, ' L1 ' or ' L2 ' type of penalty Used to specify the norm used in the penalization. Dual : Boolean Dual or primal formulation. Dual formulation is a implemented for L2 penalty. Prefer Dual=false when N_samples > N_features. C : float, optional (default=1.0) inverse of regularization strength; Must be a positive float. Like in support vector machines, smaller values specify stronger regularization. fit_intercept : bool, default:true Specifies if a constant (a.k.a. bias or intercept) should be added the decision function. intercept_scaling : float, default:1 When Self.fit_intercept was True, instance vector x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with CO Nstant value equals to intercept_scaling are appended to the instance vector. The Intercept becomes intercept_scaling * Synthetic feature weight note! The synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization in synthetic feature weight (and therefore on the Intercept) intercept_scaling have To be increased class_weight : {dict, ' Auto '}, optional Consider class imbalance, similar to cost-sensitive Over-/undersamples the samples of each class according to the given weights. If not given, all classes is supposed to the weight one. The ' auto ' mode selects weights inversely proportional to class frequencies in the training set. Random_state:int seed, randomstate instance, or None (default) : The seed of the pseudo random number generator to use when shuffling the data. tol:float, optional : tolerance for stopping criteria.
Attributes:	' coef_ ' : array, shape = [n_classes, N_features] Coefficient of the features in the decision function. coef_ is readonly property derived from raw_coef_ that follows the internal memory layout of Liblinear. ' Intercept_ ' : array, shape = [N_classes] Intercept (a.k.a Bias) added to the decision function. If fit_intercept is set to False, The intercept are set to zero.

Parameters:

penalty : string, ' L1 ' or ' L2 ' type of penalty

Used to specify the norm used in the penalization.

Dual : Boolean

Dual or primal formulation. Dual formulation is a implemented for L2 penalty. Prefer Dual=false when N_samples > N_features.

C : float, optional (default=1.0)

inverse of regularization strength; Must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept : bool, default:true

Specifies if a constant (a.k.a. bias or intercept) should be added the decision function.

intercept_scaling : float, default:1

When Self.fit_intercept was True, instance vector x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with CO Nstant value equals to intercept_scaling are appended to the instance vector. The Intercept becomes intercept_scaling * Synthetic feature weight note! The synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization in synthetic feature weight (and therefore on the Intercept) intercept_scaling have To be increased

class_weight : {dict, ' Auto '}, optional Consider class imbalance, similar to cost-sensitive

Over-/undersamples the samples of each class according to the given weights. If not given, all classes is supposed to the weight one. The ' auto ' mode selects weights inversely proportional to class frequencies in the training set.

Random_state:int seed, randomstate instance, or None (default) :

The seed of the pseudo random number generator to use when shuffling the data.

tol:float, optional :

tolerance for stopping criteria.

Attributes:

' coef_ ' : array, shape = [n_classes, N_features]

Coefficient of the features in the decision function.

coef_ is readonly property derived from raw_coef_ that follows the internal memory layout of Liblinear.

' Intercept_ ' : array, shape = [N_classes]

Intercept (a.k.a Bias) added to the decision function. If fit_intercept is set to False, The intercept are set to zero.

There are several methods in the Logisticregression class, and we often use fit and predict~

Methods

decision_function (X)	predict confidence scores for samples.
densify ()	convert coefficient matrix to dense array format.
fit (x, y)	fit the model according to the given training data. is used to train the LR classifier, where x is the training sample and y is the corresponding marker vector
fit_transform (X[, y])	fit to data and then transform it.
get_params ([deep])	get parameters for this estimator.
predict (X)	predict class labels for samples in X. is used to predict the markup of a test sample, that is, classification. X is the test sample set
predict_log_proba (X)	log of probability estimates.
predict_proba (X)	probability estimates.
score (X, y[, sample_weight])	returns the mean accuracy on the given test data and labels.
set_params (**params)	set The parameters of this estimator.
sparsify ()	convert coefficient matrix to sparse format.
Transform (x[, Threshold])	The most important features of the Reduce X to its.

Using predict returned is the test sample of the tag vector, in fact, personally think there should be the LR classifier important process parameters: weight vector, its size should be the same as the number of feature. But there is no such method, so this initiation of their own implementation of the LR algorithm of the idea, that way you can output weight vector.

Reference Links:

Http://www.cnblogs.com/xupeizhi/archive/2013/07/05/3174703.html

Http://scikit-learn.org/stable/modules/generated/sklearn.linear_model. Logisticregression.html#sklearn.linear_model. Logisticregression

Call Python's sklearn to implement the logistic reression algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More