The Quasi-Newton method (Quasi-Newton Methods) is one of the most effective methods for solving nonlinear optimization problems, in the 1950s by the American Argonne National Laboratory physicist W. C. Proposed by Davidon. This algorithm, designed by Davidon, was one of the most creative inventions in the field of nonlinear optimization. Soon R. Fletcher and M. J. D. Powell confirms that this new algorithm is much faster and more reliable than other methods, making the subject of nonlinear optimization leap through the night. In the next 20 years, the quasi-Newton method has flourished, and a large number of deformation formulae and hundreds of related papers have emerged. The quasi-Newton method and the steepest descent method (steepest descent Methods) require only the gradient of the objective function to be known at each iteration. By measuring the change of gradient, a model of the objective function is constructed to produce the super-linear convergence. This type of method is much better than the steepest descent method, especially for difficult problems. In addition, because the quasi-Newton method does not require the information of the second derivative, it is sometimes more effective than the Newton method (Newton ' s method). Today, the Optimization software contains a large number of quasi-Newton algorithms to solve unconstrained, constrained, and large-scale optimization problems. The basic idea of quasi-Newton method is as follows. First, we construct the two-times model of the objective function in the current iteration: Here is a symmetric positive definite matrix, so we take the optimal solution of this two-time model as the search direction, and get the new iteration point, in which we request the step size satisfies the Wolfe condition. Such iterations are similar to Newton's, and the difference lies in replacing the real Hesse matrix with the approximate Hesse matrix. So the key point of the quasi-Newton method is to update the matrix in each iteration. Now suppose to get a new iteration and get a new two-time model: We use the information from the previous step as much as possible to pick. Specifically, we ask for, thus getting
DFP Method
Remember,, the DFP formula is
Its implementation is as follows:
#!/usr/bin/python #-*-Coding:utf8-*-import randomimport numpy as Npimport math#f (x, y) = (x-2) ^2+ (y-1) ^2 + 1 def Sol Ution (grad_func): rate = 0.3 #Gk必须保证为正定矩阵 #Gk = Np.eye (2) Gk = Np.diag ([Random.uniform (0.0001), random.u Niform (0.0001, +)]) x = Random.uniform ( -10000,10000) y = random.uniform ( -10000,10000) point = Np.array ([[X, Y]] ). Transpose () # Numpy.array is initialized by line grad_last = Grad_func (Point[0][0], point[1][0]). Transpose () if reduce (lambda A,b:math.sqrt (A*a+b*b), [grad_last[i][0] for I in Xrange (Grad_last.shape[0])]) < 0.000001:return Get_point_c Oordinate (point) for index in xrange (0, 10000): PK =-1 * GK.DOT (grad_last) # Find the optimal rate #rate = g Rad_last.transpose (). dot (PK) [0][0]/(Pk.transpose (). dot (PK) [0][0]) * ( -0.5);p rint ' rate ', rate point = point + Rat E * PK Grad = Grad_func (point[0][0], point[1][0]). Transpose () print "Grad", Grad.transpose (), #print "GK", GK If reduce (lambda A,b:math.sqrt (A*a+b*b), [grad[i][0] for I in Xrange (Grad.shape[0])]) < 0.000001:break Delta_k = rate * PK Y_k = (grad-grad_last) Pk = Delta_k.dot (Delta_k.transpose ())/(Delta_k.transpose (). dot (y_k)) qk= Gk. Dot (y_k). dot (Y_k.transpose ()). DOT (gk)/(Y_k.transpose (). dot (gk). dot (Y_k)) * ( -1.) Gk + = Pk + Qk grad_last = Grad print "Times of Iterate:%s"% index return get_point_coordinate (point) def ge T_point_coordinate (point): Return point[0][0], point[1][0]if __name__ = = "__main__": x, y = solution (lambda a,b:np . Array ([[A-2], b-1)] print "Minimum point of F (x, y) = (x-2) ^2+ (y-1) ^2 + 1: (%s,%s)"% (x, y)
The result of the execution is:
Grad [[[ -165661.89194611-109405.68739563]]grad] [[ -100881.92784944-98677.83692405]]grad [-86031.40619151 24003.06422296]]grad [[ -58462.97031009 16589.21632892]]grad [[ -40924.16152279 11612.14537593]]grad [- 28646.91295682 8128.50214772]]grad [[ -20052.83906991 5689.95150292]]grad [[ -14036.98734894 3982.96605204]]grad] [[-98 25.89114426 2788.07623643]]grad [[ -6878.12380098 1951.6533655]]grad [[ -4814.68666069] 1366.15735585]]grad [[- 3370.28066248 956.3101491]]grad [[ -2359.19646374 669.41710437]]grad [[ -1651.43752462 468.59197306]]grad] [[-1156.006 26723 328.01438114]]grad [[ -809.20438706 229.6100668]]grad [[ -566.44307094 160.72704676]]grad] [[-396.51014966 112.50 893273]]grad [[ -277.55710476 78.75625291]]grad [[ -194.28997333 55.12937704]]grad] [[ -136.00298133 38.59056393]]grad [ [-95.20208693 27.01339475]] Grad [[ -66.64146085 18.90937632]]grad [[ -46.6490226] 13.23656343]]grad [[ -32.65431582 9.2655944]]grad] [[-22.85802107 6.48591608]]grad [[-16.00061475 4.54014126]]grad [[ -11.20043033 3.17809888]]grad [[ -7.84030123] 2.22466922]]grad [[ -5.48821086 1.55726845]]grad] [[-3.841 7476 1.09008792]]grad [[ -2.68922332 0.76306154]]grad [[ -1.88245632 0.53414308]]grad] [[ -1.31771943 0.37390015]]grad] [[ -0.9224036 0.26173011]]grad [[ -0.64568252 0.18321108]]grad [[ -0.45197776 0.12824775]]grad] [[ -0.31638443 0.08977343]]g] rad [[ -0.2214691 0.0628414]]grad [[ -0.15502837 0.04398898]]grad] [[ -0.10851986 0.03079229]]grad [[-0.0759639 0.0215546] ]grad [[[ -0.05317473 0.01508822]]grad] [[ -0.03722231 0.01056175]]grad [[ -0.02605562 0.00739323]]grad] [[-0.01823893 0.005 17526]]grad [[ -0.01276725 0.00362268]]grad [[ -0.00893708 0.00253588]]grad] [[ -0.00625595 0.00177511]]grad [- 0.00437917 0.00124258]]grad [[ -0.00306542 0.00086981]]grad [[ -0.00214579 0.00060886]]grad] [[ -0.00150205 0.0004262]]gr ad [[ -0.00105144 0.00029834]]grad [[ -0.00073601 0.00020884]]grad] [[[ -0.0005152] 0.00014619]]grad [[-0.00036064 0.000102 33]]grad [[ -2.52450311e-04 7.16322521e-05]]grad [[ -1.76715217e-04 5.01425765e-05]]grad [[ -1.23700652e-04] 3.50998035e-05]]grad [[ -8.65904565e-05 2]. 45698625e-05]]grad [[ -6.06133196e-05 1.71989037e-05]]grad [[ -4.24293237e-05] 1.20392326e-05]]grad [[ -2.97005266e-05] 8.42746283e-06]]grad [[ -2.07903686e-05 5.89922398e-06]]grad [[ -1.45532580e-05 4.12945679e-06]]grad] [[ -1.01872806e -05 2.89061975e-06]]grad [[ -7.13109643e-06 2.02343382e-06]]grad] [[ -4.99176750e-06 1.41640368e-06]]grad [[-3.49423] 725e-06 9.91482574e-07]]grad [[ -2.44596608e-06 6.94037802e-07]]grad] [[ -1.71217625e-06 4.85826461e-07]]grad [[-1.1] 9852338e-06 3.40078523e-07]]grad [[ -8.38966364e-07 2.38054966e-07]]times of Iterate:73minimum Point of F (x, Y) = (x 2) ^2+ (y-1) ^2 + 1: (1.99999958052,1.00000011903)
You can see that the final solution is very close to the actual solution (2, 1)!
Reference: Http://baike.baidu.com/link?url= O5fmcazqayvu6vd1rkvdh8uiidufgebhm3chtdn0nbuxljdmexjfw7pk89lzxeuyvcjd67orfd3fyqnauroolk#2
DFP algorithm of quasi-Newton method