- Form:
Use the sigmoid function: g(Z)= 1 1+ e? Z
Its derivative is g- (Z)=(1?g(Z))g(Z)
Assume:
That
If there is a sample of M, the likelihood function form is:
Logarithmic form:
Using gradient rise method to find its maximum value
Derivation:
The update rules are:
It can be found that the rules form and the LMS update rules are the same, however, their demarcation function hθ (x ) is completely different (the H (x) is a nonlinear function in logistic regression). About this part of the content is explained in the GLM section.
Note: if H (x) is not a sigmoid function but a threshold function:
This algorithm is called perceptual Learning algorithm. Although the updated guidelines are similar, they are not an algorithm at all with logistic regression.
- Another method of maximizing likelihood function--Newton approximation method
- Principle: Suppose we want to get a function over 0 points f ( θ ) , you can keep updating it by using the method θ To get:
Its intuitive interpretation is as follows:
Given an initial point < Span class= "Mi" id= "mathjax-span-62" style= "font-family:stixgeneral-italic;" >θ 0 Span style= "Display:inline-block; width:0px; Height:2.183em; " > If F ( θ 0 ) and its derivative of the same number indicates that 0 points on the left side of the initial point, otherwise at the initial point to the right, the initial point of updating the store's tangent of over 0 points to continue the above steps, the resulting tangent over 0 points will continue to approximate the final function over 0 points.
- Application: In logistic regression, we request the maximum (minimum) value of the likelihood function, i.e. the derivative of the likelihood function is 0, so we can use Newton approximation method:
Since the LR algorithm θ is a vector that is rewritten as:
whichHis the Hessian matrix:
Newton's method tends to converge faster than the gradient descent method (batch processing).
Machine Learning Algorithm Note 1_2: Classification and logistic regression (classification and logistic regression)