Machine Learning sklearn19.0--logistic Regression algorithm

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the cognition and application scenario of logistic regression

Logistic regression is a probabilistic nonlinear regression model, which is a study of the relationship between two classification observations and some influencing factors.

A multi-variable analysis method. The usual problem is to study whether a certain outcome occurs in some conditions, such as in medicine depending on the patient's symptoms

To determine if it suffers from a certain disease.

Second, LR classifier

The LR classifier, which is the logistic Regression Classifier.

In the classification case, the learned LR classifier is a set of weights, and when the data input of the test sample, the weight value and the test data are

According to the linear addition, here is a characteristic of each sample.

In the form of the sigmoid function, where the sigmoid function is defined as a domain, the most basic LR classifier is suitable for classifying two kinds of targets.

So the key problem of logistic regression is to study how to obtain this set of weights. This problem is done with maximum likelihood estimation.

Third, logistic regression model

Consider a vector with an independent variable, set the conditional rate to

The probability of the occurrence of the observed amount relative to an event. Then the logistic regression model can be expressed as

This is called the logistic function. which

Then the probability of not occurring under the condition is

So the ratio of events to the probability of not occurring is:

This ratio is called the occurrence ratio of the event (the odds of experiencing an event), précis-writers is odds.

Summary:

In general, regression is not a classification problem, because regression is a continuous model, and is affected by the noise is relatively large.

You can use logistic regression if you do not want to apply it to classification issues.

Logistic regression is essentially linear regression, except that a layer of function mappings is added to the mapping of feature to result.

That is, the feature is summed linearly, and then the function g (z) is used as a hypothetical function to predict. G (z) can map continuous values to 0 and 1.

The hypothetical function of logistic regression is as follows, and the linear regression hypothesis function is just.

Logistic regression is used to classify the 0/1 problem, which is the two value classification problem that the prediction result belongs to 0 or 1.

This assumes that the two value satisfies the Bernoulli distribution (0/1 distribution or two-point distribution), i.e.

Iv. Logistic regression application case

(1) Analysis of LOGISTICREGRESSIONCV function in Sklearn

(2) The code is as follows:

The file links are as follows: Link: https://pan.baidu.com/s/1dEWUEhb Password: bm1p

#!/usr/bin/env python #-*-coding:utf-8-*-# author:zhengzhengliu #乳腺癌分类案例 import Sklearn from Sklearn.linear_model im Port logisticregressioncv,linearregression from sklearn.model_selection import train_test_split
Sklearn.preprocessing Import Standardscaler from sklearn.linear_model.coordinate_descent import convergencewarning 
Import NumPy as NP import pandas as PD import matplotlib as MPL import matplotlib.pyplot as PLT import warnings #解决中文显示问题
mpl.rcparams["Font.sans-serif"] = [u "Simhei"] mpl.rcparams["axes.unicode_minus"] = False #拦截异常 Warnings.filterwarnings (action= ' ignore ', category=convergencewarning) #导入数据并对异常数据进行清除 path = "datas/
         Breast-cancer-wisconsin.data "names = [" id "," Clump Thickness "," uniformity of Cell Size "," uniformity of cell Shape " , "Marginal Adhesion", "single epithelial Cell Size", "Bare nuclei", "Bland chromatin", "Normal Nucleoli", "Mitoses" , "Class"] df = pd.read_csv (path,header=none,names=names) datas = Df.replace ("?", Np.nan). Dropna (how= "any") #只要列中有nan值, row delete operation #print (Datas.head ()) #默认显示前五行 #数据提取与数据分割 X = datas[names[1:10]] Y = datas[names[10]] #划分训练集与测试集 x_train,x_test,y_train,y_test = Train_test_split (x,y,test_size=0.1,random_state=0) #对数据的训练集进行标准化 SS = Standardscaler () X_train = Ss.fit_transform (x_train) #先拟合数据在进行标准化 #构建并训练模型 # # Multi_class: Category selection parameters, with "OVR (default)" and "Multi Nomial "Two values selectable, no difference in two-dollar logistic regression # # CV: Several folded cross-validation # # Solver: Optimization algorithm selection parameters, when penalty is" L1 ", the parameter can only be" liblinear (axis descent method) "# #" Lbfgs "and" CG " Are all about the Taylor expansion of the objective function # # When the penalty is "L2", the parameter can be "LBFGS (quasi-Newton method)", "NEWTON_CG (Newton Method Variant)", "seg (minibactch random average gradient descent)" # # Dimension <10000, Choose "Lbfgs" method, Dimension >10000, choose "CS" method is better, graphics card calculation, LBFGS "and" CS "is faster than" SEG "# # Penalty: Regularization selection parameters, used to solve the fit, optional" L1 "," L2 "# # Tol: When the target function drops to that value is stopped, called: tolerance, prevents excessive computation of LR = Logisticregressioncv (multi_class= "OVR", Fit_intercept=true,cs=np.logspace (-
2,2,20), cv=2,penalty= "L2", solver= "Lbfgs", tol=0.01) re = Lr.fit (x_train,y_train) #模型效果获取 r = Re.score (X_train,y_train) Print ("R value (accuracy):", R) print ("parameter:", re.coef_) print ("Intercept:", Re.intercept_) priNT ("Sparse feature ratio:%.2f%%"% (Np.mean (Lr.coef_.ravel () ==0)) print ("*100 function conversion value, i.e.: Probability =========sigmoid") print ( Re.predict_proba (x_test)) #sigmoid函数转化的值, i.e.: probability P #模型的保存与持久化 from sklearn.externals import joblib joblib.dump (ss, "Logist Ic_ss.model ") #将标准化模型保存 Joblib.dump (LR," Logistic_lr.model ") #将训练后的线性模型保存 joblib.load (" Logistic_ss.model ") # The model file is loaded joblib.load ("Logistic_lr.model") #预测 x_test = Ss.transform (x_test) #数据标准化

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning sklearn19.0--logistic Regression algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning sklearn19.0--logistic Regression algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support