Most of this series is from the Standford public class machine learning Andrew Teacher's explanation, add some of their own understanding, programming implementation and learning notes.
Chapter I. Logistic regression
1. Logistic regression
Logistic regression is a kind of supervised learning classification algorithm, compared with the previous linear regression algorithm, the difference is that it is a classification algorithm, which also means that Y is no longer a continuous value, but {0,1} discrete values (two types of problems in the case).
Of course, this is still a discriminant learning algorithm, the so-called discriminant learning algorithm, is that we directly to predict the posterior , or directly predict the discriminant function algorithm. Of course, the corresponding generation learning algorithm, you should then write a blog to explain the relationship between the difference between the advantages and disadvantages.
Back to the point, in the probability interpretation of the previous linear regression, we assumed that the error was the Gaussian distribution of the independent distribution (IID);
And in the logistic regression we need to assume that the event conforms to the Bernoulli distribution
That
Here's
Where this function is called the logistic function or sigmoid function, which is also the most commonly used activation function, its image:
It is obvious that the range of G (z) is in 0~1, and z tends to positive infinity is g (z) =1,z tends to negative infinity is g (z) = 0;
This function has a nature, but also to facilitate our subsequent deduction, that is (1)
(This property is not deduced here.)
We assume that the M sample is independent of the same distribution (IID), then the likelihood function is:
Then the Log-likelihood is:
The next question is quite familiar, and we can use the gradient descent, maximum likelihood function.
We might as well deduce the individual parameters first:
By (1)
Finally we find that the iteration formula we get is exactly the same as the iterative formula obtained by the previous linear regression.
Of course, this must not be a coincidence, it has a certain deeper reasons, which will be explained in the later generalized linear model.
Logistic regression (1) Logistic regression solution and probability interpretation