1. The fastest descent direction
The f (x) function represents the change rate along the direction D at the point X, which can be indicated by the direction derivative. For micro-functions, the direction derivative is equal to the Inner Product of the gradient and the direction, that is:
DF (X; d) = divide f (x) TD,
Therefore, finding the fastest descent direction of function f (x) at point X can be attributed to solving the following nonlinear programming:
Min 127f (x) TD
S. T. | d | ≤ 1
When d =-then f (x)/| then f (x) |
The time equal sign is set up. Therefore, the change rate in the direction defined above is the least, that is, the negative gradient direction is the fastest descent direction.
2. The shortest Descent Algorithm
The iteration formula of the shortest descent method is
X (k + 1) = x (k) + λ KD (K ),
Where D (k) is the search direction starting from X (k). Here we take the shortest descent direction at X (K), that is
D =-0000f (x (k )).
λ K is the step for one-dimensional search in the direction D (k) starting from X (K), that is, λ K satisfies
F (x (k) + λ KD (k) = min f (x (k) + λ D (k) (λ ≥ 0 ).
The calculation procedure is as follows:
(1) Given the initial point X (1) ε RN, the allowable error ε> 0, k = 1.
(2) Calculate the search direction d =-▽ f (x (k )).
(3) If | D (k) | ≤ ε, the calculation is stopped; otherwise, one-dimensional search is performed along d (k) starting from X (k, evaluate λ K to make
F (x (k) + λ KD (k) = min f (x (k) + λ D (k) (λ ≥ 0 ).
(4) convert X (k + 1) to X (k) + λ KD (K), set K to k + 1, and perform step (2 ).
Gradient Method
1. Orientation
The core issue of the optimization method for non-constraint problems is to select the search direction.
Taking the Positive Definite quadratic function as an example, we can observe the geometric meaning of the two directions on matrix A concatenation.
There is a quadratic function:
F (x) = 1/2 (X-x *) Ta (X-x *),
Where a is a n × n symmetric positive definite matrix, and x * is a fixed point, the isosurface of function f (x)
1/2 (X-x *) Ta (X-x *) = C
It is an elliptical sphere centered on X *, because
Partition f (x *) = a (x-x *) = 0,
A is positive, so x * is the minimum point of f (x.
Let X (1) be a point on an isosurface, And the isosurface is a normal vector at point X (1 ).
Then f (x (1) = a (x (1)-x *).
Another D (1) is a tangent vector of the isosurface at D (1. Note
D (2) = x *-x (1 ).
Naturally, D (1) and F (x (1) are orthogonal, that is, D (1) T ▽ f (x (1) = 0, so there are
D (1) Tad (2) = 0,
That is to say, the cut vector at one point on the isosurface is about the-concatenation of the vector pointing from this point to the smallest point.
It can be seen that, if the quadratic function defined by the minimization method performs one-dimensional search along D (1) and D (2) in sequence, the two iterations must reach the minimum point.
1. Gradient Method
The gradient method was first proposed by hesteness and Stiefel in 1952 to solve the linear equations. Later, people used this method to solve the unrestricted optimization problem, making it an important optimization method.
Fletcher-Reeves gradient method, FR method for short.
The basic idea of the conjugate gradient method is to combine the atomicity with the shortest descent method, construct a group of bounded directions using the gradient at the known points, and perform search in this group, find the minimum point of the target function. Based on the basic nature of the bounded direction, this method has a Quadratic Termination.
For the bounded Gradient Method of quadratic convex functions:
Min f (x) = 1/2 xtax + BTX + C,
X, RN, A is a symmetric positive definite matrix, and C is a constant.
The specific solution is as follows:
First, if an initial vertex x (1) is given, the gradient of the target function f (x) at this point is calculated. If | G1 | = 0, the calculation is stopped. Otherwise, ling
D (1) =-1_f (x (1) =-G1.
Search by D (1) in the direction to obtain the vertex x (2 ). Calculates the gradient at X (2). If | G2 | ≠ 0, 2nd search directions D (2) are constructed using-G2 and D (1 ), search along D (2.
Generally, if the point X (K) and the search direction D (k) are known, search by D (k) from X (k) to obtain
X (k + 1) = x (k) + λ KD (K ),
The step λ K meets
F (x (k) + λ KD (k) = min f (x (k) + λ D (k )).
In this case, we can find the display expression of λ K.
Calculate the gradient of f (x) at X (k + 1. If | GK + 1 | = 0, the calculation is stopped. Otherwise, use-GK + 1 and D (k) to construct the next search direction D (k + 1 ), and make D (k + 1) and D (k) about. In this case
D (k + 1) =-GK + 1 + β KD (K ),
The two ends of the above formula are left multiplied by D (k) Ta, and
D (k) Tad (k + 1) =-D (k) tagk + 1 + β KD (k) Tad (K) = 0,
Therefore
β K = D (k) tagk + 1/D (k) Tad (k ).
Start from X (k + 1) and search in the direction D (k + 1.
In the FR method, the initial search direction must take the fastest descent direction, which cannot be ignored. Factor β K can be simplified to: β K = | GK + 1 | 2/| GK | 2.
3. nonlinear bounded gradient
When the target function is a continuous function higher than the quadratic function (I .e. the gradient of the target function), the corresponding equation for solving the objective function is non-linear, and the objective function for non-linear problems may have local extreme values, the quadratic cut-off is damaged. After being improved in two aspects, the gradient method can still be used for actual Inverse Computation. However, the gradient method cannot converge to the global extreme value.
(1) The first is that the gradient method does not rely on n-dimensional space to search for the extreme point. You need to restart the gradient method to continue iteration to search for the extreme point.
(2) When the target function is complex, due to local linearity, Hessian matrix A needs to be calculated, and the computing workload is large. matrix A may also be pathological. Fletcher and Reeves are the most commonly used solutions. They discard the calculation of matrix A in the following form:
In formula, the gk-1 and GK are the gradient of the target function calculated by the first K-1 and the K search.
Shortest Descent Method