This paper mainly explains the logistic regression in the classification problem. Logistic regression is a **two classification problem** .

Reprint Please specify source: http://www.cnblogs.com/BYRans/

Two classification problems

The second classification problem is that the predicted Y value only has two values (0 or 1), and the two classification problem can be extended to the multi-classification problem. For example: We want to do a spam filtering system, is the characteristics of the message, the predicted Y value is the type of mail, spam or normal mail. For the categories we commonly call the positive class (positive class) and the negative class (negative Class), in the case of spam, the positive class is the normal message, and the negative class is spam.

Logistic regression

**Logistic functions**

If we omit the value of y in the two classification problem as a discrete value (0 or 1), we continue to use linear regression to predict the value of Y. Doing so will result in a value of Y not being 0 or 1. Logistic regression uses a function to return the Y value, so that the value of y is within the interval (0,1), which is called the **logistic function**, also known as the **sigmoid function (sigmoid function). **the function formula is as follows:

The logistic function when Z approaches infinity, g (z) approaches 1, and when Z approaches Infinity, g (z) approaches 0. The graph of the logistic function is as follows:

There is an attribute in the derivation of the logistic function, which is used in the following derivation:

**Logistic regression expressions**

Logistic regression is essentially linear regression, except that a function map is added to the mapping of the feature to the result, that is, the feature is summed linearly, and then the function g (z) is used to predict the most hypothetical function. G (z) can map continuous values from 0 to 1. The expression of the linear regression model is brought into the G (z) and the expression of the logistic regression is obtained:

By convention, let the expression be converted to:

**Soft classification of logistic regression**

Now we will take the value of Y through the logistic function normalized to (0,1), the value of Y has a special meaning, it represents the probability that the result takes 1, so the probability of the input x classification results for Category 1 and category 0, respectively:

Merging the above expressions is:

**Gradient Rise**

The expression of the logistic regression is obtained, the next step is similar to the linear regression, the likelihood function is constructed, and then the maximum likelihood estimation is deduced, and finally the iterative updating expression of θ is derived. This idea is unclear, please refer to the article "linear regression, gradient descent", but here is not gradient descent, but gradient rise, because here is the maximum likelihood function is not a minimum likelihood function.

We assume that the training samples are independent of each other, so the likelihood function expression is:

Similarly, the likelihood function takes a log, which is converted to:

The converted likelihood function is biased toward θ, where we take the case of only one training sample:

The first step in this biased process is the conversion of the θ bias derivative, according to the biased formula:y=lnx y'=1/x.

The second step is to attribute G ' (z) = g (z) (1-g (z)) according to the derivation of G (Z).

The third step is the normal transformation.

So we get the update direction of each iteration of the gradient rise, then the iteration of Theta represents the formula:

This expression looks exactly the same as the LMS algorithm's expression, but the gradient rise is two different algorithms than the LMS, because it represents a nonlinear function.

Two different algorithms, expressed in the same expression, are not just coincidences, they have deep connections. This problem, we will answer in the generalized linear model GLM.

Logistic regression-andrew ng machine Learning public Lesson Note 1.4