Logistic regression of machine learning

Source: Internet
Author: User
Tags machine learning logistic regression objective function sigmoid function

Logistic regression involves higher mathematics, linear algebra, probability theory, and optimization problems. This article tries to explain the Logistic regression to the readers in the simplest and most easy-to-understand narrative way, with less discussion of the principle of the formula and more on the case of visualization. If you are allergic to the mathematical formula, it will cause discomfort and your own consequences.

Logistic regression principle and derivation

Although there is regression in the logistic regression, the algorithm is a classification algorithm. As shown in the figure, there are two types of data (red and green) distributed as follows. If we need to classify two types of data, we can pass a straight line. Divide (w0 * x0 + w1 * x1+w2 * x2). When a new sample (x1, x2) needs to be predicted, it is brought into the line function. If the function value is greater than 0, it is a green sample (positive sample), otherwise it is a red sample (negative sample). To generalize into high-dimensional space, we need to get a hyperplane (a line in two dimensions, a plane in three dimensions, a hyperplane in n dimensions is n-1) to segment our sample data, in fact, to find the super The W parameter of the plane, which is very similar to regression, is called the Logistic regression.

Sigmoid function

Of course, we don't use the z function directly. We need to convert the z value between the intervals [0-1]. The converted z value is the probability of judging that the new sample belongs to the positive sample. We use the sigmoid function to complete this conversion process, the formula is as follows. By observing the sigmoid function graph, as shown, when the z value is greater than 0, the σ value is greater than 0.5, and when the z value is less than 0, the σ value is less than 0.5. Using the sigmoid function, the logistic regression is essentially a discriminant model based on conditional probability.

Objective function

In fact, we are now seeking W, how to ask for W, we look at the picture first, we can see that the second picture is the best of the line segmentation, in other words, can make these sample points farther away from the line Ok, so there is a good division for the arrival of the new sample. How do you formulate and calculate this objective function?

We apply the sigmoid formula to the z function:

The following formula can be derived by conditional probability, and the formula is integrated into one, see below.

Assuming that the sample and the sample are independent of each other, the probability of the entire sample set is the product of the probability of all samples:

This formula is too complicated and not easy to derive. Here is the log conversion:

At this time, the value of the objective function is required to be the largest, thereby obtaining θ.

Gradient rise method

Before introducing the gradient ascent method, let's look at a middle school knowledge: find the following function to take the maximum value when x is equal to.

Function diagram:

Solution: Find the derivative of f(x): 2x, let it be 0, and when x=0, take the maximum value to 0. However, when the function is complex, it is difficult to calculate the extremum of the function. It is necessary to use the gradient ascent method to approximate the extremum step by step by iteration. The formula is as follows. We follow the direction of the derivative (gradient) step by step. Approaching.

The gradient algorithm is used to calculate the x value of the function:

Def f(x_old):

Return -2*x_old

Def cal():

X_new = -6

X_old = 0 eps = 0.01

While abs(x_new-x_old)>presision:

Presision = 0.00001 x_old=x_new

-0.0004892181072978443

X_new=x_old+eps*f(x_old) return x_new

Objective function solving

Here, we derive the partial derivative of the function and get the iterative formula as follows:

Logistic regression practice

Data situation

Read in the data and plot it:


Def loadDataSet():

dataMat = [];labelMat = []

Fr = open('Data/Logistic/TestSet.txt')

For line in fr.readlines():

dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])

lineArr = line.strip().split()

Return dataMat, labelMat

labelMat.append(int(lineArr[2]))

Training algorithm

Calculate W using the gradient iteration formula:


Def sigmoid(inX):

Return 1.0/(1 + np.exp(-inX))

Def gradAscent(dataMatIn, labelMatIn):

dataMatrix = np.mat(dataMatIn)

labelMat = np.mat(labelMatIn).transpose()

m,n = np.shape(dataMatrix) alpha = 0.001 maxCycles = 500

h = sigmoid(dataMatrix * weights)

Weights = np.ones((n,1)) for k in range(maxCycles): error = labelMat - h

Return weights

Weights = weights + alpha * dataMatrix.transpose() * error

View the classification results by calculating the weights plot:


Algorithm advantages and disadvantages

  • Advantages: easy to understand and calculate

  • Disadvantages: low precision

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.