Logistic regression of machine learning

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Logistic regression involves higher mathematics, linear algebra, probability theory, and optimization problems. This article tries to explain the Logistic regression to the readers in the simplest and most easy-to-understand narrative way, with less discussion of the principle of the formula and more on the case of visualization. If you are allergic to the mathematical formula, it will cause discomfort and your own consequences.

Logistic regression principle and derivation

Although there is regression in the logistic regression, the algorithm is a classification algorithm. As shown in the figure, there are two types of data (red and green) distributed as follows. If we need to classify two types of data, we can pass a straight line. Divide (w0 * x0 + w1 * x1+w2 * x2). When a new sample (x1, x2) needs to be predicted, it is brought into the line function. If the function value is greater than 0, it is a green sample (positive sample), otherwise it is a red sample (negative sample). To generalize into high-dimensional space, we need to get a hyperplane (a line in two dimensions, a plane in three dimensions, a hyperplane in n dimensions is n-1) to segment our sample data, in fact, to find the super The W parameter of the plane, which is very similar to regression, is called the Logistic regression.

Sigmoid function

Of course, we don't use the z function directly. We need to convert the z value between the intervals [0-1]. The converted z value is the probability of judging that the new sample belongs to the positive sample. We use the sigmoid function to complete this conversion process, the formula is as follows. By observing the sigmoid function graph, as shown, when the z value is greater than 0, the σ value is greater than 0.5, and when the z value is less than 0, the σ value is less than 0.5. Using the sigmoid function, the logistic regression is essentially a discriminant model based on conditional probability.

Objective function

In fact, we are now seeking W, how to ask for W, we look at the picture first, we can see that the second picture is the best of the line segmentation, in other words, can make these sample points farther away from the line Ok, so there is a good division for the arrival of the new sample. How do you formulate and calculate this objective function?

We apply the sigmoid formula to the z function:

The following formula can be derived by conditional probability, and the formula is integrated into one, see below.

Assuming that the sample and the sample are independent of each other, the probability of the entire sample set is the product of the probability of all samples:

This formula is too complicated and not easy to derive. Here is the log conversion:

At this time, the value of the objective function is required to be the largest, thereby obtaining θ.

Gradient rise method

Before introducing the gradient ascent method, let's look at a middle school knowledge: find the following function to take the maximum value when x is equal to.

Function diagram:

Solution: Find the derivative of f(x): 2x, let it be 0, and when x=0, take the maximum value to 0. However, when the function is complex, it is difficult to calculate the extremum of the function. It is necessary to use the gradient ascent method to approximate the extremum step by step by iteration. The formula is as follows. We follow the direction of the derivative (gradient) step by step. Approaching.

The gradient algorithm is used to calculate the x value of the function:

Def f(x_old):

Return -2*x_old

Def cal():

X_new = -6

X_old = 0 eps = 0.01

While abs(x_new-x_old)>presision:

Presision = 0.00001 x_old=x_new

-0.0004892181072978443

X_new=x_old+eps*f(x_old) return x_new

Objective function solving

Here, we derive the partial derivative of the function and get the iterative formula as follows:

Logistic regression practice

Data situation

Read in the data and plot it:

Def loadDataSet():

dataMat = [];labelMat = []

Fr = open('Data/Logistic/TestSet.txt')

For line in fr.readlines():

dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])

lineArr = line.strip().split()

Return dataMat, labelMat

labelMat.append(int(lineArr[2]))

Training algorithm

Calculate W using the gradient iteration formula:

Def sigmoid(inX):

Return 1.0/(1 + np.exp(-inX))

Def gradAscent(dataMatIn, labelMatIn):

dataMatrix = np.mat(dataMatIn)

labelMat = np.mat(labelMatIn).transpose()

m,n = np.shape(dataMatrix) alpha = 0.001 maxCycles = 500

h = sigmoid(dataMatrix * weights)

Weights = np.ones((n,1)) for k in range(maxCycles): error = labelMat - h

Return weights

Weights = weights + alpha * dataMatrix.transpose() * error

View the classification results by calculating the weights plot:

Algorithm advantages and disadvantages

Advantages: easy to understand and calculate
Disadvantages: low precision

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More