[Lnu. Machine Learning.question.1] Some understandings of gradient descent method

Source: Internet
Author: User
Tags scalar

Once learned machine learning, in regression this section, to solve the optimization problem gradient descent method, understanding is always in the literal sense of the flesh.

Is the concept of gradients confusing? Is it a scalar or a vector? Why is the fastest descent along the negative gradient direction function? To answer these questions clearly. It's really important to explore the spirit.

I looked up some classic information (including Wikipedia), and some other personal blogs, for example

p=2573 ">http://www.codelast.com/?p=2573,http://blog.csdn.net/xmu_jupiter/article/details/22220987, Have a general visual interpretation of the concept of gradient descent, the content of these materials, combined with personal experience, let's talk.

1. Why is the direction introduced in the study of multivariate function arguments?

In the case of a one-dimensional argument, the argument can be considered a scalar, at which point a real number can represent it. At this point, if you want to change the value of the argument, it either decreases or is added. That is, "not left or right."

So, when it comes to the concept of " moving an argument in a certain direction ," it is not very obvious, but in the case of an independent variable N (n≥2) dimension. This concept is practical: assume that the argument x is 3-dimensional, that is, each x is (x1, x2, x3) Such a point, where x1,x2 and X3 are each a real number, that is, scalar.

So, suppose you want to change X. Move one point to another point, how do you move? There are too many ways to choose, like. We can make X1. X2 not change, only make X3 change, also can make x1,x3 unchanged. Only make X2 change. Wait a minute. These practices also allow us to have a " direction " concept. Because in 3-dimensional space, a point is moved to a point that is not "left-to-right" as in a one-dimensional case. It's a "direction." In this case, find a suitable "direction" so that you move from one point to another point. The change in function values is best in line with our predetermined requirements (e.g.. The value of the function is reduced to what extent), it becomes very necessary.

2. Why gradient descent (Gradient descent)

based on the definition of Wikipedia, it is assumed that the real value function is at the point can be micro and there is a definition, then the function is at the point along Gradient instead (what is the gradient?) Is that a question? ) in the direction of the fastest drop. Therefore, we use gradient descent method to find the best problem in the optimization problem derived from regression .


watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbgl1nnrvda==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">


3. So. Why is the direction falling fastest?

Love asks why the students Die Fast (). To explain the problem, we also need to use Taylor to expand, recalling:


Under the concept of gradients. This formula can be further transformed into:

(a)

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbgl1nnrvda==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center: An argument representing the K-point (a vector).

D: Unit direction (a vector). That is |d|=1.

: Step Length (a real number).

: The gradient of the objective function atthis point in x K (a vector).

: Alpha's higher order infinity.

in the formula (a),be negligible.

so-called Steepest descent , which means


in other words , it is desirable to minimize (a) That feels the least, and is the form of the inner product of the vector ( if the angle between vector D and the negative gradient is θ):

(b)

(b) The minimum when and only if the \theta=0, the direction of the vector D (the direction of the change of the independent variable direction) take a negative gradient, this direction is the direction of the most gradient change (the negative change is the smallest, always require the concept of direction in mind).

4. Geometrical interpretation

This process is illustrated in a piece, where F is defined on a plane, and the function image is a bowl shape.

The blue curve is the contour (level set). That is, the function F is a set of constants of the curve. The red arrow points to the opposite direction of the point gradient. (The gradient direction at a point is perpendicular to the contour that passes through the point).

Along the gradient descending direction, will finally reach the bottom of the bowl, that is, the function F value of the smallest point


Geometric interpretation of gradient descent method:
Because our task is to obtain the minimum value of the experience loss function. So the process is actually a "downhill" process.

At each point. We want to go one step further (if one step is a fixed value of 0.5 meters), so that the height of the descent is the largest, then we have to choose the direction of the highest rate of change of slope to go down. This direction is the inverse direction of the experience loss function in this point gradient.

Every step, we're going to compute the gradient of the function at the current point, and then select the inverse direction of the gradient as the direction to go. With each iteration, the gradient is continuously reduced to zero at the end.

This is why it is called the gradient descent method.

Let's talk about this. Knock symbol, Lei code too tired ...


here, to Mr. Orange, L Earnhard, Wiki Express heartfelt thanks





[Lnu. Machine Learning.question.1] Some understandings of gradient descent method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.