Logical regression of machine learning
In the previous chapter, we learned about general linear regression, and now let's take a look at what the hell is logistic regression?
In fact, from this point of view, I think that the logistic regression is not a return, but directly belong to a classification problem, but the classification, why also called regression, because our logistic regression function is used:
If the prediction is greater than 0.5, the judgment is positive, otherwise negative, for regression, our essential function is the loss function, the general linear regression function is as follows:
and the loss function of the logistic regression is this:
This is not a convex function, so it is difficult to find an optimal solution, when mathematicians play a role, and they use their wisdom to study the following model:
This is the case:
Let's take a look at what the logistic regression really is.
First read in the data:
import matplotlib.pyplot as pltfrom scipy.optimize import minimizedef loaddata(file, delimeter): data = np.loadtxt(file, delimiter=delimeter) print(‘Dimensions: ‘,data.shape) print(data[1:6,:]) return(data)
We also need to print out the dots:
def plotData(data, label_x, label_y, label_pos, label_neg, axes=None): neg = data[:,2] == 0 pos = data[:,2] == 1 if axes == None: axes = plt.gca() axes.scatter(data[pos][:,0], data[pos][:,1], marker=‘+‘, c=‘k‘, s=60, linewidth=2, label=label_pos) axes.scatter(data[neg][:,0], data[neg][:,1], c=‘y‘, s=60, label=label_neg) axes.set_xlabel(label_x) axes.set_ylabel(label_y) axes.legend(frameon= True, fancybox = True);
Let's read the data:
data = loaddata(‘data1.txt‘,‘,‘)
Let's see what the data looks like.
X = np.c_[np.ones((data.shape[0],1)), data[:,0:2]]y = np.c_[data[:,2]]plotData(data, ‘Exam 1 score‘, ‘Exam 2 score‘, ‘Pass‘, ‘Fail‘)
So our job is to find a dividing point. It's good to separate the two types of data.
Defining Logistic regression
def sigmoid(z): return(1 / (1 + np.exp(-z)))
The definition of the loss function is this:
def costFunction(theta,X,y): m = y.size h = sigmoid(X.dot(theta)) J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y)) if np.isnan(J[0]): return np.inf return J[0]
Let's use the chain derivation rule to take the derivative of this loss function:
So we have the derivative function:
def gradient(theta,X,y): m = y.size h = sigmoid(X.dot(theta.reshape(-1,1))) grad = (1.0/m)*X.T.dot(h-y) return grad.flatten()
Let's take a look at the error and gradient of the initial value:
initial_theta = np.zeros(X.shape[1])cost = costFunction(initial_theta,X,y) #0.69314718055994529grad = gradient(initial_theta,X,y) #[ -0.1 , -12.00921659, -11.26284221]
So we're going to minimize this function:
res = minimize(costFunction, initial_theta, args=(X,y), jac=gradient, options={‘maxiter‘:400}) #这里使用了scipy中的库,不懂的同学可以看看官方文档,很简单的。
That's what our predictive function is like:
def predict(theta, X, threshold=0.5): p = sigmoid(X.dot(theta.T)) >= threshold return(p.astype(‘int‘))
We draw the previous data set to the boundary:
plt.scatter(45, 85, s=60, c=‘r‘, marker=‘v‘, label=‘(45, 85)‘)plotData(data, ‘Exam 1 score‘, ‘Exam 2 score‘, ‘Admitted‘, ‘Not admitted‘)x1_min, x1_max = X[:,1].min(), X[:,1].max(),x2_min, x2_max = X[:,2].min(), X[:,2].max(),xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))h = sigmoid(np.c_[np.ones((xx1.ravel().shape[0],1)), xx1.ravel(), xx2.ravel()].dot(res.x))h = h.reshape(xx1.shape)plt.contour(xx1, xx2, h, [0.5], linewidths=1, colors=‘b‘);
Logistic regression with regularization
Before we learned the general logistic regression without the regularization parameters, let's look at how regularization is done.
data2 = loaddata(‘data2.txt‘,‘,‘)X = data2[:,0:2]y = np.c_[data2[:,2]]plotData(data2, ‘Microchip Test 1‘, ‘Microchip Test 2‘, ‘y = 1‘, ‘y = 0‘)plt.show()
We need to find a boundary to classify these points nicely. As you can see, the points are not strictly separate from each other, we define a higher-order function to fit the data set ourselves.
from sklearn.preprocessing import PolynomialFeaturespoly = PolynomialFeatures(6)XX = poly.fit_transform(data2[:,0:2])
def costFunctionReg(theta, reg, *args): m = y.size h = sigmoid(XX.dot(theta)) J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y)) + (reg/(2.0*m))*np.sum(np.square(theta[1:])) if np.isnan(J[0]): return(np.inf) return(J[0])
def gradientReg(theta, reg, *args): m = y.size h = sigmoid(XX.dot(theta.reshape(-1,1))) grad = (1.0/m)*XX.T.dot(h-y) + (reg/m)*np.r_[[[0]],theta[1:].reshape(-1,1)] return(grad.flatten())
Let's look at the effect of the different regularization coefficients on the results:
Initial_theta = Np.zeros (xx.shape[1]) Costfunctionreg (Initial_theta, 1, XX, y) FIG, axes = plt.subplots (1,3, Sharey = True, Figsize= (17,5) # decision boundary, let's take a look at the regularization coefficients. Lambda is too big or too small to see what happens # lambda = 0: There is no regularization, so it's over fit # lambda = 1: This is the right way to open # lambda = 100: The regularization item is too aggressive, causing the basic to not fit the decision boundary for I, C in enumerate ([0.0, 1.0, 100.0]): # optimization Costfunctionreg res2 = Minimize (costfunc Tionreg, Initial_theta, args= (C, XX, y), Jac=gradientreg, options={' Maxiter ': 3000}) # Accuracy accuracy = 100.0*sum (Predi CT (res2.x, XX) = = Y.ravel ())/y.size # Hash plot for x, y PlotData (data2, ' Microchip Test 1 ', ' Microchip Test 2 ', ' y = 1 ', ' y = 0 ', Axes.flatten () [i]) # Draw decision Bounds X1_min, X1_max = X[:,0].min (), X[:,0].max (), x2_min, X2_max = X[:,1].min (), X[:,1].max (), xx1, xx2 = Np.meshgrid (Np.linspace (X1_min, X1_max), Np.linspace (X2_min, X2_max)) H = sigmoid (poly.fit _transform (Np.c_[xx1.ravel (), Xx2.ravel ()). dot (res2.x)) H = H.reshape (Xx1.shape) Axes.flatten () [I].contour (Xx1, xx2 , h, [0.5], Linewidths=1, Colors= ' G '); Axes.flatten () [I].set_title (' Train accuracy {}% with LAMBDA = {} '. Format (np.round (accuracy, decimals=2), C))
Here, the return has been studied, in the next learning stage, if there is no fault, please spit groove!
Logical regression of machine learning