Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

Source: Internet
Author: User

"Total Catalog" http://www.cnblogs.com/tbcaaa8/p/4415055.html

1. Solving equations with Newton's method

Newton's method is an iterative algorithm for solving equations, and it can be used to solve the equations. The idea is to approximate the original equation by using the linear part of the equation (especially the nonlinear equation). Without losing its generality, consider the equation f (x) = 0. For f (x) Taylor expansion at X=t, f (x) =f (t) +f ' (t) (X-T) + ...

Take the linear part instead of f (x), with the equation f (x) = 0, you can get f (t) +f ' (t) (X-T) =0, and then solve X=t-f (t)/F ' (t). By writing the solution of the equation as an iterative form, the iterative formula of Newton's method can be obtained:

[Example] using Newton's method to solve the equation x3+x=2

The first step: f (x) and F ' (x), i.e. f (x) =x3+x-2, F ' (x) =3x2+1

Step Two: Select the iteration initial value. The initial value should generally be selected near the solution in case the algorithm does not converge. Choose here X (0) =2

The third step: iterative solution based on iterative formulas and initial values. The iterative process is as follows:

K X (k) F (x (k))
0 2.00 8.00
1 1.38 2.04
2 1.08 0.35
3 1.00 0.02
4 1.00 0.00

conclusion: after 4 iterations, the function value becomes 0, that is, the root of the original equation has been found.

The convergence condition and convergence speed of Newton method are omitted. In the application of machine learning, it is possible to reduce the occurrence of non-convergence by experimenting with different initial values, and if Newton's method converges, the convergence rate of convergence can be reached two times, and the number of iterations decreases obviously compared with the gradient descent method.

2. Solving equations with Newton method

In the previous article in this series, we used the gradient descent method to solve the minimum value of the loss function J, and from the above description, Newton's iteration was only used to solve the root of the equation, and what is the relation to the minimum value of the multivariate function? In fact, it is required that the minimum value of the multivariate function is 0 of the partial derivative of each independent variable, and the value of each independent variable can be solved at this time. Therefore, the multivariate function minimization problem is transformed into a multi-dimensional nonlinear equation Group solution problem.

First, we consider the Taylor expansion of multivariate functions. Without losing generality, F1 (x1,x2,..., xn) As an example, the Taylor expansion at the point (T1,t2,..., TN) is as follows:

Take the linear part instead of F1 (x) and make it 0, which is:

By organizing it into vector form and separating out the variables, you can get: (For simplicity, the following uses F1 instead of F1 (T1,t2,..., tn) )

It is assumed that the equations are composed of a series of equations {f1=0, f2=0, ..., fn=0}, which can be organized in matrix form:

The n*n matrix in the above formula is the Jacobian matrix (Jacobian matrix) and the précis-writers is J(F). At the same time, the argument (X1,..., xn) is recorded as X, the (T1,..., tn) is recorded as T and (F1,..., fn) is recorded as F:

After simplification, you will be able to:

By writing the solution of the equations as an iterative form, we can get the Newton method iterative formula which is applicable to the solution of the equation Group:

It can be found that although the number of iterations of Newton's method is much smaller than the gradient descent method, the inverse matrix of J(F) needs to be recalculated during each iteration. If n is a characteristic dimension, the time complexity of θ (n3) is usually required for the calculation of inverse matrices. The time complexity of the inverse matrix calculation can be reduced to θ (nlog27) using the Strassen method, or the inverse matrix can be approximated with a numerical method, but these two methods are still very slow when the feature dimension is large. Therefore, Newton's method can be quickly convergent only when the feature dimension is small. In particular, when taking n=1, the upper formula can be reduced to a Newton iterative formula derived from the 1th section of this paper for solving a single equation.

3. Using Newton method to find the extremum of a function

If you use ▽f (x) to represent the gradient vectors of the function f (x), you can get an iterative formula for finding the extremum of the function by taking it into the ordinary Newton iterative formula:

Consider the following:

Iterative formulas can be further simplified in form:

whereH(f) represents the Haisen matrix (Hessian matrix) of function f (x1,..., xn).

For specific issues, the previous article in this series requires a minimum of loss function. In addition to the gradient descent method described earlier, the Newton method described in this article can be used. The corresponding iteration formula is:

Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.