Gradient Descent Method

Last Update:2014-11-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Basic Concepts

The gradient descent method uses the negative gradient direction to determine the new search direction of each iteration, so that each iteration can gradually reduce the target function to be optimized. The gradient descent method is the fastest descent method under the 2-norm. A simple form of the fastest descent method is: X (k + 1) = x (k)-A * g (K), where A is called the learning rate, it can be a small constant. G (k) is the gradient of X (k.

Ii. Derivative

(1) Definition

There is a function with a defined domain and values in the real number field. If a value is defined in a neighboring area of a vertex, when the independent variable gets an increment (the vertex is still in the neighboring area), the corresponding function gets an increment; if the limit exists, it is called the function at the point.CustomizableAnd it is called the limit at the point of the function.Derivative, As, that is:

It can also be recorded as, or.

For a common function, if the incremental concept is not used, the derivative of the function at the point can also be defined as: when the variable in the definite field approaches,

. That is to say,

Change Rate of derivative reaction

The derivative of a function at a certain point describes the change rate of the function near this point. The essence of the derivative is to use the Limit Concept to locally approximate the function. When the independent variable of a function produces an increment at one point, if the ratio of the increment of the output value of the function to the increment value of the independent variable tends to 0, it is the derivative at the place, recorded as, or

(2) geometric meaning:

The image curve of a Real-value function. The derivative of a function is equal to the slope of the tangent of this point on its image. The derivative is the local property of the function. Not all functions have derivatives, and a function does not necessarily have derivatives on all points. If a function has a derivative at a certain point, it is indicated that it can be exported at this point; otherwise, it is called non-bootable. If the independent variables and values of a function are both real numbers, the derivative of the function at a certain point is the tangent slope of the curve represented by the function at this point.

Specifically:

When both the function definition field and value are in the real number field, the derivative can represent the tangent slope of the function curve. As shown in, it is set as a fixed point on the curve and a dynamic point on the curve. When the curve gradually tends to a point and the limit position of the cut line exists, it is called the tangent at the curve.

If the curve is a function image, the slope of the cut line (blue) is:

When the tangent (red), that is, the limit position, exists, the slope is:

The formula above is exactly the same as the derivative definition in general definition. That is to say, the geometric meaning of the derivative is the slope of the tangent of the curve at the point.

(3) guiding functions

A derivative is a number that refers to the function value of the function at the point. If a function can be exported to every point in a certain range in its definition field, it can be said that the function can be exported within the range, at this time, for each specific value in the corresponding to a definite derivative value, thus forming a new function, called the original functionGuide function,Note:, or, it can also be said that the function is a derivative.

3. One-dimensional function differentiation

Differentiation and derivative are two different concepts. However, for a single-dimensional function, the concept of micro-and bootable is completely equivalent. The difference between a function and an independent variable is multiplied by the difference of the independent variable. In other words, the difference between the function and the independent variable is equal to the derivative of the function. Therefore, the derivative is also calledWeChat. So the differentiation of functions can be recorded^[

(1) Change Rate of the Differential Reaction

The difference can roughly describe how the function value changes when the value of the function independent variable is small enough. When the independent variables of some functions have a slight change, the changes can be divided into two parts. One part is a linear part: in one dimension, it is proportional to the variation of the independent variable, which can be expressed as the product of an unrelated, only function and related amount; in a wider case, it is a linear ing applied on the value. The other part is infinitely smaller than the higher order, that is, the Division will continue to be zero in the future. When the change volume is small, the second part is negligible. The change volume of the function is equal to the first part, that is, the function'sDifferentiation, As or. If a function has more than one property, it can be called a function at this point.

(2) Definition

The Set function is defined in a certain range. For the inner point, when it changes to a nearby one (also within this range. IfIncrementalIt can be expressed as (a constant independent of each other), but an infinitely smaller number than a higher order. Then, it is called a function that can be at a vertex, and a function that corresponds to a differential of an incremental independent variable, that is, yesLinear master.^[1]¹⁴¹

The increment of an independent variable is called the differentiation of the independent variable.

(3) geometric meaning

Function differentiation at one point. The red line is a microcomponent, and the gray line is actually changed.

It is an increment of the point on the curve on the X coordinate, an increment of the curve on the point corresponding to the Y coordinate, and an increment of the curve on the point tangent corresponding to the Y coordinate. When it is very small, it is much smaller than (higher-order infinitely small), so we can use a cut line to represent the curve segment near a point.

(4) about infinitely small quantities

If a sequence meets the following requirements:

Use the limit symbol to describe the above properties

Then, the sequence is called an infinitely small number.^[

B) level comparison

It is set to two sequences, and each of them is an infinitely small number. Although they tend to be zero when they tend to be infinite, they tend to be zeroSpeedThere is a difference. You can compare their speed using the following method:

If there is a positive integer for any positive number

Always true at the time, it is called YesHigh-order infinitely small,

Some of them are also omitted and not written.

In the above definition, we can also say an infinitely small number.AThe orderBHigh, orARatioBTends to zero faster

4. Multivariate function differentiation

(1) Euclidean Space

To represent the real number field. For any positive integer N, the N tuples of the real number constitute an n-dimensional vector space, used to represent. It is sometimes calledReal coordinate space. Element writing in, here all are real numbers. As a vector space, its operations are defined as follows:

In Euclidean space, some content is added: euclidean structure.
In order to do Euclidean ry, we hope to discuss the distance between two points, the angle between a straight line or a vector. A natural method is to introduce their "standard inner product" (called dot product in some literature) to any two vectors ):

That is to say, any two vectors in correspond to a real value. The Inner Product defined in this way is called the upperEuclidean StructureAt this time, it is also called the n-dimensional Euclidean space, Inner Product "<,>" is calledEuclidean Inner Product.

Using this inner product, we can establish concepts such as distance, length, and angle:

Vector length:

Here the length function satisfies the nature required by the norm, so it is also called the upperEuclidean norm.

AndInner CornerThe following columns are given

Here is the arccosine function.

Finally, we can use the euclidean norm to defineDistance Function, OrMeasurement:

This distance function is called the Euclidean measurement. It can be seen as a form of stock theorem.

This only refers to the space of the real number vector, which is called after the Euclidean structure defined above is added.Euclidean Space; Some authors mark it with symbols. The Euclidean structure makes it possible to have these spatial structures: inner product space, Hilbert space, Norm Space, and measurement space.

(2) open set

An open set is a set that does not contain its own border points. In other words, an open set also contains the full and small neighborhoods of any of its points. The concept of open set is generally closely related to the concept of topology. Generally, the concept of open set is first made public and then defined through it.

Function Analysis

In RNThe midpoint set is an open set. If all vertices in this set arePAll are internal points.

Internal point

LingSIs a subset of Euclidean space. IfXThe centered ball opener is included inS, ThenXYesS.

This definition can be extended to the measurement space.XAny subsetS. SpecificallyDMeasurement spaceX,XYesSIf anyR> 0, existsYBelongS, AndD(X,Y) <R

PointXYesSBecause it is included inSAnd there is a ball around it. PointYInSOn the Boundary

Euclidean Space

NDimensional Space RNSubsetUIs an open set. If anyUVertices inX, There is a real number ε> 0 so that if any RNMidpointY, FromXThe Euclidean distance to it is less than ε, thenYAlso belongsU. Equivalent,UIs an open set, if allUVertices in includeU.

(3) Definition

Set From Euclidean SpaceRAn open set in N (or any inner product space) is specifiedRA function of M. For the point in and the point in the neighbor. If linear ing exists,

The function can be called micropoints. Linear ing is called the differentiation at a point.

If it can be micro at a point, it must be continuous at the point, and there is only one differential at the point. To distinguish them from partial derivatives, the differentiation of Multivariate functions is also calledFull DifferentiationOrFull Derivative.

When a function has micropoints at each point in a region, you can consider ing the following functions:

This function is generally calledDifferential Functions

Full DifferentiationTotal derivative is a concept of calculus. It refers to the linear primary part of the full increment of a multivariate function. For example, for a binary function, if F is defined in a neighbor of a vertex and is any point in the neighbor, the full increment of the function in the vertex can be expressed

It is only related to, but not ,. If the upper order is infinitely small at that time, it is called that the function can be differentiated at the point, that is, the full differentiation of the function at the point.

Or.

(4)Neighborhood

Is a basic concept in a topology. Intuitively, a vertex's neighborhood is a set that contains this vertex, and its nature is extended: You can "Shake" the vertex without leaving it.

Set on a planeVYesPIfPThe small disc is contained inVMedium

IfXIs the Topology Space andPYesXA point in,POfNeighborhoodYesV, IncludingPOpening setU,

Note:VIt must not be an open set. IfVIf it is an open set, it is calledOpen neighbor. Some authors require that the neighborhood is an open set, so it is important to pay attention to conventions.

A set of all the neighborhood of a vertex is called a neighborhood system at this point.

IfSYesXSubset,SOfNeighborhoodYesV, IncludingSOpening setU. Accessible setVYesSWhen and only when it is inSThe neighborhood of all vertices in.

In the measurement spaceM= (X,D), SetVYesPOfNeighborhood, IfPCenter and radius areRTo kick off,

It is included inV.

VCalled setSOfConsistent neighborhood, If a positive number existsRForSAll elementsP,

Included inV.

ForR> 0 setSOfR-NeighborhoodYesXModerate andSThe distance is lessRThe set of all vertices (or equivalentSThe center radius of a point isR).

Can be obtained directlyR-The neighborhood is a consistent neighborhood, And a set is a consistent neighborhood. if and only when it containsRValueR-Neighborhood.

Set on the planeSAndSConsistent neighborhoodV.

V. Gradient

1. Related Concepts

If the attribute of each point in a space can be represented by a scalar, then this field is a scalar field.

If the attribute of each point in a space can be represented by a vector, then this field is a vector field.

The gradient at a point in the scalar field points to the fastest growing direction of the scalar field. The gradient length is the maximum change rate.

GradientA word is sometimes usedSlopeThat is, a curved surface following a given directionSkewDegree.

2. Computing

The gradient of a scalar function is recorded:

(Nabla) indicates the vector differential operator.

In 3D, the expression is expanded

Vi. Gradient Descent Method

The gradient descent method is based on the observation that if a real-value function can be micro-defined at a point, the function will drop fastest in the opposite direction of the gradient.

Therefore, if

For a small enough value, then.

With this in mind, we can start with the initial estimation of the local minimum value of the function, and consider the following sequence to make

Therefore, you can obtain

If it succeeds, the sequence converges to the expected extreme value. Note that each iterationStep SizeIt can be changed.

Gradient Descent Method

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Gradient Descent Method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Gradient Descent Method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support