The principle of gradient descent and its application in linear regression and logistic regression

Last Update:2018-07-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Basic Concepts

1) definition

Gradient Descent method is to use negative gradient direction to determine the new search direction of each iteration, so that each iteration can reduce the objective function to be optimized gradually .

The gradient descent method is the steepest descent method under the 2 norm. A simple form of the steepest descent method is: X (k+1) =x (k)-a*g (k), where a is called the learning rate, which can be a smaller constant. G (k) is the gradient of X (k).

The gradient is actually the derivative of the function.

2) Example

For the function z=f (x, y), the first is biased and then the y is biased, then the gradient is (,).

For example, biased =4x,=6y. The gradient (8,24) at the (2,4) point. The application of 2 gradient descent in linear regression

Assume that the function is in the following form:

The cost function uses the minimum mean square loss function:

This error estimation function is the sum of squares of the estimate of X (i) and the difference of the true value Y (i) (the gradient descent takes into account all samples) as the error estimation function, and 1/2 of the preceding multiply is for the sake of derivation, the coefficient is gone.

Our goal is to choose the right one, so that the value of cost function is minimized.

Next, we introduce the process of gradient reduction, that is, the function is biased. Because it is a linear function, for each component \theta _{i}, the other item is 0.

The process of updating, that is, the θi will be reduced to the least direction of the gradient. Θi represents the value before the update,-the latter part represents the amount of decrease in gradient direction, and α indicates the step size, that is, how much to change in the direction of the gradient reduction each time.

A very important place to note is that the gradient is directional, for a vector θ, each dimension component θi can be a gradient direction, we can find a whole direction, in the change, we will be in the direction of the most downward change to achieve a minimum point, Whether it is local or global.
3 Application of gradient descent in logistic regression logarithmic loss function:
Since y can only be equal to 0 or 1, it is possible to combine the two formulas of the cost function in logistic regression, which is deduced as follows:

Therefore, the cost function of logistic regression can be simplified to:

Note that the formula in the brackets is the maximum likelihood function in the maximum likelihood estimation of the logistic regression, and the maximum likelihood function is obtained, and the estimated value of the parameter (\theta\) is given. In turn, in order to find a suitable parameter, we need to minimize the cost function, namely:

Minθj (θ)

For the new variable x, the result is output according to the formula of hθ (x):

Similar to linear regression, here we use the gradient descent algorithm to learn the parameter θ, for J (θ):

The goal is to minimize J (θ), then the gradient descent algorithm is as follows:

After the derivation of J (θ), the gradient descent algorithm is as follows:

Note that this algorithm is almost identical to the gradient descent algorithm in linear regression, except that the representation of hθ (x) is different.
4 References:

Gradient Descent algorithm, http://blog.sina.com.cn/s/blog_62339a2401015jyq.html

Gradient Descent method, http://deepfuture.iteye.com/blog/1593259

Stanford University's sixth lesson in machine learning "logistic regression, http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF% e5%9d%a6%e7%a6%8f%e5%a4%a7%e5%ad%a6%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0%e7%ac%ac%e5%85%ad%e8%af%be-%e9%80%bb% E8%be%91%e5%9b%9e%e5%bd%92-logistic-regression

A brief talk on BP algorithm, http://blog.csdn.net/pennyliang/article/details/6695355 5 Harvest

1) Understand the concept of gradient;

2) Review the concept of derivative formula and partial derivative;

3) The gradient descent formula is deduced and mastered more firmly.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The principle of gradient descent and its application in linear regression and logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The principle of gradient descent and its application in linear regression and logistic regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support