1 Basic Concepts
1) definition
Gradient Descent method is to use negative gradient direction to determine the new search direction of each iteration, so that each iteration can reduce the objective function to be optimized gradually .
The gradient descent method is the steepest descent method under the 2 norm. A simple form of the steepest descent method is: X (k+1) =x (k)-a*g (k), where a is called the learning rate, which can be a smaller constant. G (k) is the gradient of X (k).
The gradient is actually the derivative of the function.
2) Example
For the function z=f (x, y), the first is biased and then the y is biased, then the gradient is (,).
For example, biased =4x,=6y. The gradient (8,24) at the (2,4) point. The application of 2 gradient descent in linear regression
Assume that the function is in the following form:
The cost function uses the minimum mean square loss function:
This error estimation function is the sum of squares of the estimate of X (i) and the difference of the true value Y (i) (the gradient descent takes into account all samples) as the error estimation function, and 1/2 of the preceding multiply is for the sake of derivation, the coefficient is gone.
Our goal is to choose the right one, so that the value of cost function is minimized.
Next, we introduce the process of gradient reduction, that is, the function is biased. Because it is a linear function, for each component \theta _{i}, the other item is 0.
The process of updating, that is, the θi will be reduced to the least direction of the gradient. Θi represents the value before the update,-the latter part represents the amount of decrease in gradient direction, and α indicates the step size, that is, how much to change in the direction of the gradient reduction each time.
A very important place to note is that the gradient is directional, for a vector θ, each dimension component θi can be a gradient direction, we can find a whole direction, in the change, we will be in the direction of the most downward change to achieve a minimum point, Whether it is local or global.
3 Application of gradient descent in logistic regression logarithmic loss function:
Since y can only be equal to 0 or 1, it is possible to combine the two formulas of the cost function in logistic regression, which is deduced as follows:
Therefore, the cost function of logistic regression can be simplified to:
Note that the formula in the brackets is the maximum likelihood function in the maximum likelihood estimation of the logistic regression, and the maximum likelihood function is obtained, and the estimated value of the parameter (\theta\) is given. In turn, in order to find a suitable parameter, we need to minimize the cost function, namely:
Minθj (θ)
For the new variable x, the result is output according to the formula of hθ (x):
Similar to linear regression, here we use the gradient descent algorithm to learn the parameter θ, for J (θ):
The goal is to minimize J (θ), then the gradient descent algorithm is as follows:
After the derivation of J (θ), the gradient descent algorithm is as follows:
Note that this algorithm is almost identical to the gradient descent algorithm in linear regression, except that the representation of hθ (x) is different.
4 References:
Gradient Descent algorithm, http://blog.sina.com.cn/s/blog_62339a2401015jyq.html
Gradient Descent method, http://deepfuture.iteye.com/blog/1593259
Stanford University's sixth lesson in machine learning "logistic regression, http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF% e5%9d%a6%e7%a6%8f%e5%a4%a7%e5%ad%a6%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0%e7%ac%ac%e5%85%ad%e8%af%be-%e9%80%bb% E8%be%91%e5%9b%9e%e5%bd%92-logistic-regression
A brief talk on BP algorithm, http://blog.csdn.net/pennyliang/article/details/6695355 5 Harvest
1) Understand the concept of gradient;
2) Review the concept of derivative formula and partial derivative;
3) The gradient descent formula is deduced and mastered more firmly.