Resources
<PYTHON_MACHINE_LEARNING> Chapter3
A Tour of the machine learning
Classifers Using Scikit-learn
Introduction
When we classify, the eigenvalues in the sample are generally distributed in the real number field, but what we want is often a similar probability value in [0,1]. Or so, in order for the eigenvalues not to cause interference between the differences between the large, for example, only one feature value is particularly large, but the other values are very small, we need to normalization of the data. That is, we need to use a single injection from R to [0,1] to process the eigenvalue matrix before machine learning. When the mapping used is the sigmoid function, we call this machine learning algorithm called logistic regression.
PS: Logistic regression is used to classify!!! Not for a linear regression! The inverse function of the sigmoid function is called the Logit function, which is the origin of logistic regression, which is not related to logic ...
sigmoid function
This function is characterized by the definition of an S-type domain in R, the function of the range in [0,1]
At the same time it also represents the probability of y=1, the probability of y=0 is 1-phi (z)
Draw a description
#! /usr/bin/python <br> #-*-coding:utf8-*-Import Matplotlib.pyplotAs PltImport NumPyAs NPDefSigmoid(z):Return1.0/(1.0+np.exp (-Z)) z = Np.arange (-10,10,0.1) p = sigmoid (z) plt.plot (z,p)#画一条竖直线, if you do not set the value of x, the default is 0plt.axvline (x=0, Color=' K ') Plt.axhspan (0.0,1.0,facecolor= ' 0.7 ', Alpha=0.4) # draw a horizontal line, If you do not set the value of Y, the default is 0plt.axhline (Y=1, Ls= ' dotted ', Color= ' 0.4 ') plt.axhline (Y=0, Ls= ' dotted ', color=< Span class= "hljs-string" > ' 0.4 ') plt.axhline (Y=0.5, Ls= dotted ', Color= ' K ') Plt.ylim (-0.1,1.1) Span class= "hljs-comment" > #确定y轴的坐标plt. Yticks ([0.0, 0.5, 1.0]) Plt.ylabel ( ' $\phi (z) $ ') Plt.xlabel ( ' z ') ax = PLT.GCA () ax.grid (true) plt.show ()
Logistic regression of logistic regression algorithm
- Basic principle
The logistic regression algorithm is similar to the Adaline linear Adaptive algorithm, except that the activation function is changed from the * * constant mapping y = z * * to y = sigmoid (z)
- The loss function in logistic regression
Recall the loss function that is applied to the gradient descent model Adaline The sum function squared difference
This is a loss function of linear regression.
But for the sigmoid function of the S-type, this definition is particularly close to 0 when y approaches -1,1.
This is defined for the logistic regression loss function for logistic regression
Logarithmic likelihood loss function (cross entropy)
Ps: All the logs are actually ln .
How did this loss function come about? Maximum Likelihood method
The likelihood function is defined first (each sample is considered independent):
Likelihood function can be regarded as conditional probability
The concept of likelihood function can refer to Kevingao's blog
Http://www.cnblogs.com/kevinGaoblog/archive/2012/03/29/2424346.html
According to the concept of likelihood function, the probability that the maximum likelihood function is the most reasonable. We want to maximize the likelihood function, but this form is still not good enough, after all it is a form of multiplication, so we take the logarithm
Well now, we know: the Power vector
WMake
LThe biggest time,
WThe most reasonable
So we're going to define
JFunction:
J =-L
For a better understanding, let's look at the loss function of a single sample:
Take Y=1 as an example, when the predicted value is close to the correct value,
JWill converge to 0.
- Weight update
As with the gradient descent method, the formula
After calculation
We've got the formula for the weight update.
Just like Adaline.
Not surprisingly? Not surprised?
This means that we are writing in separate
logisticregressionclass, you only need to
AdalineClass to redefine the excitation function phi.
Practice
Let's go to the next chapter Sklearn implement Perceptron Perceptron based on the Iris data set to Practice
__author__ =' Administrator '#! /usr/bin/python <br> #-*-Coding:utf8-*-From SklearnImport datasetsFrom Sklearn.linear_modelImport LogisticregressionFrom Sklearn.cross_validationImport Train_test_splitFrom sklearn.preprocessingImport StandardscalerFrom Sklearn.metricsImport Accuracy_scoreFrom PDCImport Plot_decision_regionsImport Matplotlib.pyplotAs PltFrom Matplotlib.colorsImport ListedcolormapImport NumPyAs Npiris = Datasets.load_iris () x = iris.data[:,[2,3]]y = Iris.targetx_train,x_test,y_train,y_test = Train_test_split (X, y, test_size=0.3, random_state = 0) sc = Standardscaler () sc.fit (X_train) X_ TRAIN_STD = Sc.transform (x_train) x_test_std = Sc.transform (x_test) Ir = Logisticregression (c= 1000.0,random_state=0) Ir.fit (x_train_std,y_train) X_ COMBINED_STD = Np.vstack ((x_train_std,x_test_std)) y_combined = Np.hstack ((y_train,y_test)) plot_decision_regions (X= x_combined_std,y=y_combined, Classifier=ir, Test_idx=range (105, 150)) Plt.xlabel ( ' petal length [standardized] ') Plt.ylabel ( petal width [standardized] ') plt.legend (Loc= ' upper left ') plt.savefig ( ' iris.png ') plt.show () print (X_test_std[0,:]) A = Ir.predict_ Proba (X_test_std[0,:]) print (a)
Over-fitting, under-fitting and regularization
Over-fitting and under-fitting are two common problems in machine learning
- Over fitting
Commonly known as thinking too much. For a good fit training set, the model used too many parameters, become particularly complex, even noise and error are divided into a class, such a model although the training set simulation is very good, but for the prediction of the data set is particularly unreliable, we say: This model has a high variance (High variance)
-Less fitting
Corresponding, the mind is too simple. Models are too simple to be reliable for predictive datasets
We have this model with a high bias (higher deviation)
- Regularization of the Ruglarization
Regularization is a common method to prevent overfitting. Regularization, simply put, is the introduction of additional deviations to reduce the impact of some extreme weights.
The most common regularization is L2 regularization , which he adds to the end of the loss function
Lambda is called a regularization parameter
This loss function form becomes:
Ir = LogisticRegression(C=1000.0,random_state=0)
The parameter C in the class logisticregression is derived from the related concepts of support vector machine (SVM), which is not unfolded first.
The final form of the loss function:
- Effect of C-value on simulation
Set from 5 to 4 10 different powers as C values, let's look at the effects of weights
weights, params = [], []For C inlist (range (-5, 5)): LR = logisticregression (c=10** int (c), Random_state=0) Lr.fit (X_TRAIN_STD, Y_train) Weights.append (Lr.coef_[1]) params.append (10**c) weights = Np.array (weights) plt.plot (params, weights[:, 0],label= petal Length ') Plt.plot (Params,weights[:,1],linestyle= '--', label= ' petal width ') plt.ylabel ( ' weight coefficient ') Plt.xlabel ( Span class= "hljs-string" > ' C ') plt.legend (Loc= ' upper left ') Plt.xscale ( ' log ') plt.show ()
Ling Yu Live
Links: https://www.jianshu.com/p/9db03938ea72
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Rookie Note python3--machine learning (ii) logistic regression algorithm