Machine Learning (4) Logistic Regression 1. algorithm Derivation
Unlike gradient descent, logistic regression is a type of classification problem, while the former is a regression problem. In regression, Y is a continuous variable, while in classification, Y is a discrete group. For example, y can only be {0, 1 }.
If a group of samples is like this and linear regression is needed to fit these samples, the matching effect will be poor. If the Y value is only {0, 1}, you can use the classification method.
And make
Define the logistic function (also known as the sigmoid function ):
Is the distribution curve of the logistic function g (z). When Z is large, g (z) tends to 1, when Z is small, g (z) tends to 0, when z = 0, g (z) = 0.5. Therefore, g (z) is controlled between {0, 1. Other g (z) functions can also be used between {0, 1}. However, the sigmoid function is the most commonly used function in subsequent chapters.
Assume that X is given as the probability that y = 1 and Y = 0 of the parameter:
Can be abbreviated:
Assuming that M training samples are independent, the likelihood function of θ can be written as follows:
To solve the maximum log likelihood of L (θ:
In order to maximize the likelihood, this method is similar to linear regression that uses gradient descent to calculate the deviation of the number likelihood pair, that is:
Note: The formula for the gradient descent algorithm is as follows. This is a gradient rise. Gradient: = gradient means that the variation value of the two iterations (or the two samples) is the derivative of L (gradient.
Then
That is, similar to the random gradient rise algorithm in the previous lesson, the form is the same as linear regression, but the symbol is opposite. It is a logistic function, but in essence it is different from linear regression.
2. Sample Code
Machine Learning (4) Logistic Regression