1. Preface
Today we introduce the famous logistic regression in machine learning. Don't look at his name "return", but it is actually a classification algorithm. It is called logistic regression mainly because it is a transformation from linear regression.
2. Logical regression principle 2.1 origin of logistic regression
I don't know if the reader remembers. There is a section generalized linear regression in linear regression that describes the function transformation of \ ( y\) to \ ( g (Y) =xθ\)on the basis of \ (y=xθ\ ). In logistic regression, this \ (g () \) is the famous \ (Sigmoid (x) =\frac{1}{1+e^{-x}}\) function. The image of the sigmoid function is as follows. The sigmoid function maps the argument \ (x\in\{-\infty,+\infty\}\) to \ (y\in\{-1,+1\}\) .
Sigmoid functions are everywhere, the derivative is very concise \ (S^{\prime} (x) =s (x) (1-s (x)) \), its derivative image is as follows. You can see that the derivative range is \ (s^\prime\in\{0,0.25\}\), by the way, in deep learning, if the activation function is sigmoid, there will be a gradient vanishing problem.
2.2 Model of the logistic regression
The models in linear regression are:
\[z_{\theta}={x}\theta\]
The functions of sigmoid are:
\[s (z) =\frac{1}{1+e^{-z}}\]
The result of the last set of sigmoid functions:
\[h_{\theta} (X) = \frac{1}{1+e^{-x\theta}}\]
2.3 Loss function of logistic regression
The logistic regression uses the maximum likelihood method to deduce the loss function.
We know that the definition of logistic regression assumes that our sample output is 0 or 12 classes. Then we have:
\[p (Y=1|x,\theta) = H_{\theta} (x) \]
\[p (Y=0|x,\theta) = 1-h_{\theta} (x) \]
Putting the two cases together is the following formula:
\[p (Y|x,\theta) = H_{\theta} (x) ^y (1-h_{\theta} (x)) ^{1-y}\]
We get the probability distribution function expression of \ (y\) , we can use the likelihood function maximization to solve the model coefficients \ (\theta\)we need. Maximum likelihood function \ (L (\theta) \):
\[l (\theta) = \prod\limits_{i=1}^{m} (H_{\theta} (x^{(i))) ^{y^{(i)}} (1-h_{\theta} (x^{(i)})) ^{1-y^{(i)}}\]
The loss function is the negative value of the logarithmic likelihood function
\[j (\theta) =-LNL (\theta) =-\sum\limits_{i=1}^{m} (y^{(i)}log (H_{\theta} (x^{(i)})) + (1-y^{(i)}) log (1-h_{\theta} (x^ {(i)}))) \]
2.4 Optimization method of loss function for logistic regression
For the loss function minimization of logistic regression, there are many methods, the most common are gradient descent method, Axis descent method, Newton method and so on.
2.5 Regularization of logistic regression
Logistic regression also faces a fitting problem, so we have to consider regularization as well. There are common L1 regularization and L2 regularization.
The regularization form of L1
\[j (\theta) =-LNL (\theta) + \alpha|\theta|\]
The regularization form of L2
\[j (\theta) =-LNL (\theta) + \frac{1}{2}\alpha|\theta|^2\]
3. Summary
The logistic regression hypothesis data obeys the Bernoulli distribution , on the basis of linear regression, a sigmoid function of two classification is set up, the loss function is deduced using the maximum likelihood method , and a discriminant classification algorithm of the loss function is optimized by the gradient descent method. Some of the advantages and disadvantages of logistic regression are:
3.1 Advantages
- The implementation is simple and widely used in industrial problems;
- Training speed is faster. Fast classification
- Low memory footprint;
- Convenient observation sample probability score, can be interpreted strong;
3.2 Disadvantages
- When the feature space is very large, the performance of logistic regression is not very good;
- General accuracy is not too high
- Difficult to deal with data imbalance issues
(Welcome reprint, reproduced please indicate the source.) Welcome to communicate: [email protected])
Logical regression (Logistic Regression)