Machine Learning sklearn19.0--logistic Regression algorithm

Source: Internet
Author: User

First, the cognition and application scenario of logistic regression


Logistic regression is a probabilistic nonlinear regression model, which is a study of the relationship between two classification observations and some influencing factors.

A multi-variable analysis method. The usual problem is to study whether a certain outcome occurs in some conditions, such as in medicine depending on the patient's symptoms

To determine if it suffers from a certain disease.


Second, LR classifier


The LR classifier, which is the logistic Regression Classifier.

In the classification case, the learned LR classifier is a set of weights, and when the data input of the test sample, the weight value and the test data are

According to the linear addition, here is a characteristic of each sample.

In the form of the sigmoid function, where the sigmoid function is defined as a domain, the most basic LR classifier is suitable for classifying two kinds of targets.

So the key problem of logistic regression is to study how to obtain this set of weights. This problem is done with maximum likelihood estimation.

Third, logistic regression model

Consider a vector with an independent variable, set the conditional rate to

The probability of the occurrence of the observed amount relative to an event. Then the logistic regression model can be expressed as

This is called the logistic function. which

Then the probability of not occurring under the condition is

So the ratio of events to the probability of not occurring is:

This ratio is called the occurrence ratio of the event (the odds of experiencing an event), précis-writers is odds.




Summary:

In general, regression is not a classification problem, because regression is a continuous model, and is affected by the noise is relatively large.

You can use logistic regression if you do not want to apply it to classification issues.

Logistic regression is essentially linear regression, except that a layer of function mappings is added to the mapping of feature to result.

That is, the feature is summed linearly, and then the function g (z) is used as a hypothetical function to predict. G (z) can map continuous values to 0 and 1.

The hypothetical function of logistic regression is as follows, and the linear regression hypothesis function is just.

Logistic regression is used to classify the 0/1 problem, which is the two value classification problem that the prediction result belongs to 0 or 1.

This assumes that the two value satisfies the Bernoulli distribution (0/1 distribution or two-point distribution), i.e.

Iv. Logistic regression application case

(1) Analysis of LOGISTICREGRESSIONCV function in Sklearn








(2) The code is as follows:

The file links are as follows: Link: https://pan.baidu.com/s/1dEWUEhb Password: bm1p

#!/usr/bin/env python #-*-coding:utf-8-*-# author:zhengzhengliu #乳腺癌分类案例 import Sklearn from Sklearn.linear_model im Port logisticregressioncv,linearregression from sklearn.model_selection import train_test_split
Sklearn.preprocessing Import Standardscaler from sklearn.linear_model.coordinate_descent import convergencewarning 
Import NumPy as NP import pandas as PD import matplotlib as MPL import matplotlib.pyplot as PLT import warnings #解决中文显示问题
mpl.rcparams["Font.sans-serif"] = [u "Simhei"] mpl.rcparams["axes.unicode_minus"] = False #拦截异常 Warnings.filterwarnings (action= ' ignore ', category=convergencewarning) #导入数据并对异常数据进行清除 path = "datas/
         Breast-cancer-wisconsin.data "names = [" id "," Clump Thickness "," uniformity of Cell Size "," uniformity of cell Shape " , "Marginal Adhesion", "single epithelial Cell Size", "Bare nuclei", "Bland chromatin", "Normal Nucleoli", "Mitoses" , "Class"] df = pd.read_csv (path,header=none,names=names) datas = Df.replace ("?", Np.nan). Dropna (how= "any") #只要列中有nan值, row delete operation #print (Datas.head ()) #默认显示前五行 #数据提取与数据分割 X = datas[names[1:10]] Y = datas[names[10]] #划分训练集与测试集 x_train,x_test,y_train,y_test = Train_test_split (x,y,test_size=0.1,random_state=0) #对数据的训练集进行标准化 SS = Standardscaler () X_train = Ss.fit_transform (x_train) #先拟合数据在进行标准化 #构建并训练模型 # # Multi_class: Category selection parameters, with "OVR (default)" and "Multi Nomial "Two values selectable, no difference in two-dollar logistic regression # # CV: Several folded cross-validation # # Solver: Optimization algorithm selection parameters, when penalty is" L1 ", the parameter can only be" liblinear (axis descent method) "# #" Lbfgs "and" CG " Are all about the Taylor expansion of the objective function # # When the penalty is "L2", the parameter can be "LBFGS (quasi-Newton method)", "NEWTON_CG (Newton Method Variant)", "seg (minibactch random average gradient descent)" # # Dimension <10000, Choose "Lbfgs" method, Dimension >10000, choose "CS" method is better, graphics card calculation, LBFGS "and" CS "is faster than" SEG "# # Penalty: Regularization selection parameters, used to solve the fit, optional" L1 "," L2 "# # Tol: When the target function drops to that value is stopped, called: tolerance, prevents excessive computation of LR = Logisticregressioncv (multi_class= "OVR", Fit_intercept=true,cs=np.logspace (-
2,2,20), cv=2,penalty= "L2", solver= "Lbfgs", tol=0.01) re = Lr.fit (x_train,y_train) #模型效果获取 r = Re.score (X_train,y_train) Print ("R value (accuracy):", R) print ("parameter:", re.coef_) print ("Intercept:", Re.intercept_) priNT ("Sparse feature ratio:%.2f%%"% (Np.mean (Lr.coef_.ravel () ==0)) print ("*100 function conversion value, i.e.: Probability =========sigmoid") print ( Re.predict_proba (x_test)) #sigmoid函数转化的值, i.e.: probability P #模型的保存与持久化 from sklearn.externals import joblib joblib.dump (ss, "Logist Ic_ss.model ") #将标准化模型保存 Joblib.dump (LR," Logistic_lr.model ") #将训练后的线性模型保存 joblib.load (" Logistic_ss.model ") # The model file is loaded joblib.load ("Logistic_lr.model") #预测 x_test = Ss.transform (x_test) #数据标准化

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.