Gauss-newton algorithm is one of the most common algorithms for solving nonlinear optimal problems, and recently studied the code of Open source project, and then came across it, simply looking deeper. The contents of this lecture are as Follows:
- Knowledge of basic mathematical nouns
- Newton method derivation, Algorithm step, calculation example
- Gauss-newton method derivation (how to derive from Newton method), algorithm steps, programming examples
- Summary of the merits of Gauss-newton method
I. Definition of BASIC concepts
1. The definition of nonlinear equation and the optimization method
The relationship between the dependent variable and the independent variable is not linear, such as the square relation, the logarithmic relation, the exponential relation, the triangular function relation and so On. For this kind of equation, it is difficult to get the exact solution for the optimal point of n-element real function f in the whole n-dimensional vector space rn, and it is often necessary to ask approximate solution problem.
Most of the methods for solving this optimization problem are iterative algorithms of successive one-dimensional search, and the basic idea is to select a favorable search direction at an approximate point, and carry out a one-dimensional search in this direction to obtain new approximation Points. This iterative iteration is known to meet the predetermined accuracy Requirements. This type of iterative algorithm can be divided into two categories depending on the search direction:
analytic method: need to use objective function to function,
Gradient Method: also known as the steepest descent method, is the early analytic method, the convergence speed is slow
Newton's method: Fast Convergence speed, but not stable, calculation is more Difficult. The Gauss-newton method is based on its improvement, but the target function is different
Conjugate gradient method: fast convergence, good effect
Variable scale method: high efficiency, commonly used DFP method (davidon Fletcher Powell)
Direct method: No derivative is involved, only the function value is Used. There are alternating direction method (also called coordinate rotation method), mode search method, rotation direction method, Powell conjugate direction method and simplex acceleration method.
2. Nonlinear Least squares problem
The nonlinear least squares problem comes from the nonlinear regression, that is, the coefficients of the nonlinear objective function are obtained by observing the independent variables and the dependent variable data, so that the function model is as similar as the observed Quantity.
Gauss-newton method solves the most basic method of nonlinear least squares problem, and it can only handle two functions . (the target function must be converted to two times when Used)
Unlike Newton ' smethod, the Gauss–newton algorithm can only is used to minimize a sum ofsquared function values
3. Basic Mathematical expression
A. Gradient gradient, a vector of the various partial derivatives of the multivariate function
Take the Two-dollar function as an example, the gradient is:
B. The Hessian matrix Hessian matrix, consisting of a second derivative of the multivariate function, describes the local curvature of the function, taking a two-element function as an example,
C. Jacobian matrices Jacobian matrix, is a matrix of the first-order partial derivatives of multivariate functions arranged in a certain way, which embodies the optimal linear approximation of a micro-equation and a given point. Take the Two-dollar function as an example,
If the extended multidimensional word f:rn-> Rm, then the Jacobian matrix is a m-row n-column matrix:
Jacobian matrix action, If P is a point in rn, f is differentiable at P point, then the derivative at this point is given by the JF (p), in which case the linear operator described by F (P) is the optimal linear approximation of the f of the point P:
D. Residual residual, indicating the difference between the actual observed value and the estimated value (fitted Value)
second, Newton method
The basic idea of Newton's method is to use polynomial function to approximate the value of a given function, and then find the estimate of the minimum point and repeat the operation until a certain precision is reached.
1. Consider the following one-dimensional, unconstrained minimization problem:
thus, the procedure for calculating a Winnuton method is as Follows:
It is important to note that Newton's method may not converge to a minimum point if the initial point is not good when it is seeking the extremum.
2. The case for multidimensional Unconstrained Extrema is given below:
If the nonlinear objective function F (x) has a second-order continuous biasing, at x (k) is an approximation of its minimum point, at which the Taylor of F (x) is expanded, i.e.:
If F (x) is a two-time function, then its Hessian matrix H is a constant, formula (1) is accurate (equal number), in this case, from any point penalty, the formula (2) only one step to find out the minimum of F (x) (assuming the Hessian matrix positive definite, all eigenvalues greater than 0)
If f (x) is not a two-time function, the formula (1) is just an approximate expression, at which point the minimum point is calculated by the formula (2), which is just the approximate minimum of F (x). In this case, the search direction is often selected as Follows:
Newton's method converges fast, which is very effective when the second derivative of F (x) and the inverse matrix of the Hessian matrix are easy to Calculate. "but usually the Hessian matrix is very difficult to find."
3. An example of a practical calculation is given BELOW.
Example: Try Newton's method to find the minimum value
Solution:
"f (x) is a two-time function, and the H-matrix is a constant, so long as any point is set, the minimum point can be obtained in one step"
three, Newton Gauss method
1. How Gauss-newton is derived from the above
Sometimes in order to fit the data, for example, according to the projection error to find the camera pose (r,t as the equation coefficient), the solution model is often converted to nonlinear least squares problem. The Gauss-newton method is used to solve the nonlinear least squares problem and achieve the purpose of data fitting, parameter estimation and function Estimation.
Suppose we study the following form of nonlinear least squares problem:
Residual (re-projection error) between the two locations:
If there is a large number of observation points (multidimensional), we can calculate the position of two cameras by choosing a reasonable t to make the sum of squares of the residuals minimum. Machine Vision This block temporarily does not extend, then how to find the nonlinear least squares problem.
If the Newton method is used to find the equation 3, then Newton's iterative formula Is:
See here we all understand the difference between Gauss-newton and Newton's law, right here on this Iteration. The classical Gauss-newton algorithm iterates over the iteration step λ to 1.
so, gauss-newton, Why would you abandon the second derivative of the Hessian matrix? The main problem is that the Second-order information items in the Hessian matrix in Newton's method are usually difficult to calculate or spend a lot of work, and the use of the entire H secant approximation is not advisable, because the gradient is calculated by J (x), so that the First-stage information item in H JTJ is almost ready. In view of this, in order to simplify the calculation and obtain the effective algorithm, we can approximate the second order information item by the first derivative Information.
Note that this is done only if the residual R is close to 0 or close to the linear function so that the Second-order information item can be Ignored. Usually called " small residue problem ", otherwise the Gauss-newton method does not converge.
3. For example
The next code does not guarantee the algorithm convergence mechanism, in Example 2 of the Self-hi can see the Disadvantage. On the argument dimension, the code can support multiple, but two examples are one-dimensional, such as Example 1 only the year t, in fact, can add other factors, do not care.
Example 1, according to data from the United States from 1815 to 1885, estimates the parameters A and B in the population model. As shown in the table below, the number of known years and total population, and the population model equation, to find the parameters of the Equation.
A Simple demo of Gauss-newton algorithm on a user defined function#include <cstdio> #include <vector> #includ E <opencv2/core/core.hpp>using namespace std;using namespace Cv;const double deriv_step = 1e-5;const int max_iter = 100;void Gaussnewton (double (*func) (const mat &input, const mat, ms),//function pointer const MAT &inputs, const Mat &outputs, Mat ms);d ouble deriv (double (*func) (const mat &input, const mat, ms),//function Pointer const MAT &input, const MAT ms, int n);//the user defines their function heredouble Func (const mat &input, const mat? ms); int main () {//for This demo we ' re going-try and fit to the Function//F = A*exp (t*b)//there is 2 parameters:a Bint n Um_params = 2; Generate random data using these parameters int total_data = 8; Mat Inputs (total_data, 1, cv_64f); Mat outputs (total_data, 1, cv_64f);//load observation data for (int i=0; i < total_data; i++) {inputs.at<d Ouble> (i,0) =i+1; Load Year}//load America populationoutputs.at<double> (0,0) = 8.3;outputs.at<double> (1,0) = 11.0;o utputs.at<double> (2,0) = 14.7;outputs.at<double> (3,0) = 19.7;outputs.at<double> (4,0) = 26.7;o utputs.at<double> (5,0) = 35.2;outputs.at<double> (6,0) = 44.4;outputs.at<double> (7,0) = 55.9; Guess the parameters, it should is close to the true value, else it can fail for very sensitive functions! Mat params (num_params, 1, cv_64f);//init guess params.at<double> (0,0) = 6;params.at<double> (1,0) = 0.3; Gaussnewton (Func, inputs, outputs, params); printf ("Parameters from gaussnewton:%f%f\n", params.at<double> (0,0), params.at<double> (1,0)); Return 0;} Double Func (const Mat &input, const MAT. Ms) {//assumes input is a single row matrix//assumes params is a column Matr Ixdouble A = params.at<double> (0,0);d ouble B = params.at<double> (1,0);d ouble x = input.at<double> (0,0 ); Return A*exp(x*b);} Calc the n-th params ' partial derivation, the params is our final targetdouble deriv (double (*func) (const Mat &INP ut, Const MAT ms), const MAT &input, const mat, ms, int N) {//assumes input is a single row matrix//Returns the Deri Vative of the nth parametermat params1 = Params.clone (); Mat params2 = params.clone ();//use Central difference to get derivativeparams1.at<double> (n,0)-= Deriv_step;param s2.at<double> (n,0) + = deriv_step;double P1 = func (input, params1);d ouble p2 = func (input, params2);d ouble d = (p2- p1)/(2*deriv_step); return d;} void Gaussnewton (double (*func) (const mat &input, const mat? ms), const MAT &inputs, const MAT &outputs, mat? m S) {int m = inputs.rows; int n = inputs.cols; int num_params = params.rows; Mat R (m, 1, cv_64f); Residual matrix Mat Jf (m, num_params, cv_64f); Jacobian of Func () Mat input (1, n, cv_64f); Single row input double Last_mse = 0; For (int i=0; i < max_iter; i++) { Double MSE = 0; For (int j=0; J < m; j + +) {for (int k=0; k < n; k++) {//copy Independent variable vectors, the year Inpu t.at<double> (0,k) = inputs.at<double> (j,k); } r.at<double> (j,0) = outputs.at<double> (j,0)-Func (input, params);//diff between estimate and OBS Ervation population MSE + = r.at<double> (j,0) *r.at<double> (j,0); For (int k=0; K < num_params; k++) {jf.at<double> (j,k) = Deriv (Func, input, params, k); }} mse/= m; The difference in MSE are very small, so quit if (fabs (mse-last_mse) < 1e-8) {break; } Mat delta = ((jf.t () *jf)). INV () * jf.t () *r; params + = delta; printf ("%d:mse=%f\n", i, mse); printf ("%d%f\n", i, mse); Last_mse = mse; }}
Operation Result:
a=7.0,b=0.26 ( initial value,a=6,b=0.3), 100 iterations to the 4th time Convergence.
If the initial value is a=1,b=1, the convergence is iterated 14 times.
According to the A and B coefficients obtained above, the curve of population model is fitted by MATLAB.
Example 2: I want to fit the following model,
Due to the lack of observations, the self-directed self-acting, assuming that 4 parameters are known a=5,b=1,c=10,d=2, constructs 100 random numbers as the X observation value, calculates the corresponding function observation Value. then, using these observations, the 4 parameters are pushed Back.
A Simple demo of Gauss-newton algorithm on a user defined function#include <cstdio> #include <vector> #includ E <opencv2/core/core.hpp>using namespace std;using namespace Cv;const double deriv_step = 1e-5;const int max_iter = 100;void Gaussnewton (double (*func) (const mat &input, const mat, ms),//function pointer const MAT &inputs, const Mat &outputs, Mat ms);d ouble deriv (double (*func) (const mat &input, const mat, ms),//function Pointer const MAT &input, const MAT ms, int n);//the user defines their function heredouble Func (const mat &input, const mat? ms); int main () {//for This demo we ' re going-try and fit to the Function//F = A*sin (Bx) + c*cos (Dx)//there is 4 parameter s:a, B, C, Dint num_params = 4; Generate random data using these parameters int total_data = 100; Double A = 5; Double B = 1; Double C = 10; Double D = 2; Mat Inputs (total_data, 1, cv_64f); Mat outputs (total_data, 1, cv_64f); For (int i=0; I < total_data; I++) {double x = -10.0 + 20.0* Rand ()/(1.0 + rand_max);//random between [ -10 and ten] double y = a*sin (B *x) + C*cos (d*x); Add some noise//y + = -1.0 + 2.0*rand ()/(1.0 + rand_max); inputs.at<double> (i,0) = x; outputs.at<double> (i,0) = y; }//Guess the parameters, it should is close to the true value, else it can fail for very sensitive functions! Mat params (num_params, 1, cv_64f); params.at<double> (0,0) = 1; params.at<double> (1,0) = 1; params.at<double> (2,0) = 8; Changing to 1 would cause it not to find the solution, too far away params.at<double> (3,0) = 1; Gaussnewton (Func, inputs, outputs, params); printf ("True parameters:%f%f%f%f\n", A, B, C, D); printf ("Parameters from gaussnewton:%f%f%f%f\n", params.at<double> (0,0), params.at<double> (1,0), params .at<double> (2,0), params.at<double> (3,0)); Return 0;} Double FUNC (const Mat &input, const MAT. Ms) {//assumes input is a single row matrix//assumes params is a column matrixdouble A = params.at<double> (0,0);d ouble B = params.at<double> (1,0);d ouble C = params.at<double> (2,0); Double D = params.at<double> (3,0);d ouble x = input.at<double> (0,0); return a*sin (b*x) + c*cos (d*x);} Double Deriv (double (*func) (const mat &input, const mat? ms), const MAT &input, const mat, ms, int N) {//assumes in Put is a single row matrix//Returns the derivative of the nth parametermat params1 = Params.clone (); Mat params2 = params.clone ();//use Central difference to get derivativeparams1.at<double> (n,0)-= Deriv_step;param s2.at<double> (n,0) + = deriv_step;double P1 = func (input, params1);d ouble p2 = func (input, params2);d ouble d = (p2- p1)/(2*deriv_step); return d;} void Gaussnewton (double (*func) (const mat &input, const mat? ms), const MAT &inputs, const MAT &outputs, mat? m S) {int m = inputs.rows; int n = inputs.cols; int num_params = params.rows; Mat R (m, 1, cv_64f); Residual matrix Mat Jf (m, num_params, cv_64f); Jacobian of Func () Mat input (1, n, cv_64f); Single row input double Last_mse = 0; For (int i=0; i < max_iter; i++) {double MSE = 0; For (int j=0; J < m; j + +) {for (int k=0; k < n; k++) {input.at<double> (0,k) = Inputs.at<doubl E> (j,k); } r.at<double> (j,0) = outputs.at<double> (j,0)-Func (input, params); MSE + = r.at<double> (j,0) *r.at<double> (j,0); For (int k=0; K < num_params; k++) {jf.at<double> (j,k) = Deriv (Func, input, params, k); }} mse/= m; The difference in MSE are very small, so quit if (fabs (mse-last_mse) < 1e-8) {break; } Mat delta = ((jf.t () *jf)). INV () * jf.t () *r; params + = delta; printf ("%d:mse=%f\n", i, mse);printf ("%f\n", mse); Last_mse = mse; }}
Operation results, The resulting parameters are not ideal, 50 times after convergence
, each iteration residuals are not continuously reduced and there are repeated
4. Pros and Cons analysis
Advantages:
for 0 residual problem, i.e. r=0, there is local second order convergence speed
For small residuals, where r is small or nearly linear, there is a fast local convergence rate
For linear least squares problems, one step to a minimum.
Disadvantages:
For large residual problems that are not very serious, there is a slow local convergence rate
For a problem with a large residual quantity or a very nonlinear problem of r, it does not converge
Not necessarily overall convergence
If J is dissatisfied with rank, then the method has no definition
For its shortcomings, we can increase the linear search strategy to ensure that the target function drops every step, for almost all nonlinear least squares problems, it has local convergence and general convergence, that is, the so-called resistance Gauss Method.
Where AK is a one-dimensional search factor
Gauss-newton Algorithm Learning