Logistic regression involves higher mathematics, linear algebra, probability theory, and optimization problems. This article tries to explain the Logistic regression to the readers in the simplest and most easy-to-understand narrative way, with less discussion of the principle of the formula and more on the case of visualization. If you are allergic to the mathematical formula, it will cause discomfort and your own consequences.
Logistic regression principle and derivation
Although there is regression in the logistic regression, the algorithm is a classification algorithm. As shown in the figure, there are two types of data (red and green) distributed as follows. If we need to classify two types of data, we can pass a straight line. Divide (w0 * x0 + w1 * x1+w2 * x2). When a new sample (x1, x2) needs to be predicted, it is brought into the line function. If the function value is greater than 0, it is a green sample (positive sample), otherwise it is a red sample (negative sample). To generalize into high-dimensional space, we need to get a hyperplane (a line in two dimensions, a plane in three dimensions, a hyperplane in n dimensions is n-1) to segment our sample data, in fact, to find the super The W parameter of the plane, which is very similar to regression, is called the Logistic regression.
Sigmoid function
Of course, we don't use the z function directly. We need to convert the z value between the intervals [0-1]. The converted z value is the probability of judging that the new sample belongs to the positive sample. We use the sigmoid function to complete this conversion process, the formula is as follows. By observing the sigmoid function graph, as shown, when the z value is greater than 0, the σ value is greater than 0.5, and when the z value is less than 0, the σ value is less than 0.5. Using the sigmoid function, the logistic regression is essentially a discriminant model based on conditional probability.
Objective function
In fact, we are now seeking W, how to ask for W, we look at the picture first, we can see that the second picture is the best of the line segmentation, in other words, can make these sample points farther away from the line Ok, so there is a good division for the arrival of the new sample. How do you formulate and calculate this objective function?
We apply the sigmoid formula to the z function:
The following formula can be derived by conditional probability, and the formula is integrated into one, see below.
Assuming that the sample and the sample are independent of each other, the probability of the entire sample set is the product of the probability of all samples:
This formula is too complicated and not easy to derive. Here is the log conversion:
At this time, the value of the objective function is required to be the largest, thereby obtaining θ.
Gradient rise method
Before introducing the gradient ascent method, let's look at a middle school knowledge: find the following function to take the maximum value when x is equal to.
Function diagram:
Solution: Find the derivative of f(x): 2x, let it be 0, and when x=0, take the maximum value to 0. However, when the function is complex, it is difficult to calculate the extremum of the function. It is necessary to use the gradient ascent method to approximate the extremum step by step by iteration. The formula is as follows. We follow the direction of the derivative (gradient) step by step. Approaching.
The gradient algorithm is used to calculate the x value of the function:
Def f(x_old):
Return -2*x_old
Def cal():
X_new = -6
X_old = 0 eps = 0.01
While abs(x_new-x_old)>presision:
Presision = 0.00001 x_old=x_new
-0.0004892181072978443
X_new=x_old+eps*f(x_old) return x_new
Objective function solving
Here, we derive the partial derivative of the function and get the iterative formula as follows:
Logistic regression practice
Data situation
Read in the data and plot it:
Def loadDataSet():
dataMat = [];labelMat = []
Fr = open('Data/Logistic/TestSet.txt')
For line in fr.readlines():
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
lineArr = line.strip().split()
Return dataMat, labelMat
labelMat.append(int(lineArr[2]))
Training algorithm
Calculate W using the gradient iteration formula:
Def sigmoid(inX):
Return 1.0/(1 + np.exp(-inX))
Def gradAscent(dataMatIn, labelMatIn):
dataMatrix = np.mat(dataMatIn)
labelMat = np.mat(labelMatIn).transpose()
m,n = np.shape(dataMatrix) alpha = 0.001 maxCycles = 500
h = sigmoid(dataMatrix * weights)
Weights = np.ones((n,1)) for k in range(maxCycles): error = labelMat - h
Return weights
Weights = weights + alpha * dataMatrix.transpose() * error
View the classification results by calculating the weights plot:
Algorithm advantages and disadvantages