Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

Last Update:2015-05-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Total Catalog" http://www.cnblogs.com/tbcaaa8/p/4415055.html

1. Solving equations with Newton's method

Newton's method is an iterative algorithm for solving equations, and it can be used to solve the equations. The idea is to approximate the original equation by using the linear part of the equation (especially the nonlinear equation). Without losing its generality, consider the equation f (x) = 0. For f (x) Taylor expansion at X=t, f (x) =f (t) +f ' (t) (X-T) + ...

Take the linear part instead of f (x), with the equation f (x) = 0, you can get f (t) +f ' (t) (X-T) =0, and then solve X=t-f (t)/F ' (t). By writing the solution of the equation as an iterative form, the iterative formula of Newton's method can be obtained:

[Example] using Newton's method to solve the equation x3+x=2

The first step: f (x) and F ' (x), i.e. f (x) =x3+x-2, F ' (x) =3x2+1

Step Two: Select the iteration initial value. The initial value should generally be selected near the solution in case the algorithm does not converge. Choose here X (0) =2

The third step: iterative solution based on iterative formulas and initial values. The iterative process is as follows:

K	X (k)	F (x (k))
0	2.00	8.00
1	1.38	2.04
2	1.08	0.35
3	1.00	0.02
4	1.00	0.00

conclusion: after 4 iterations, the function value becomes 0, that is, the root of the original equation has been found.

The convergence condition and convergence speed of Newton method are omitted. In the application of machine learning, it is possible to reduce the occurrence of non-convergence by experimenting with different initial values, and if Newton's method converges, the convergence rate of convergence can be reached two times, and the number of iterations decreases obviously compared with the gradient descent method.

2. Solving equations with Newton method

In the previous article in this series, we used the gradient descent method to solve the minimum value of the loss function J, and from the above description, Newton's iteration was only used to solve the root of the equation, and what is the relation to the minimum value of the multivariate function? In fact, it is required that the minimum value of the multivariate function is 0 of the partial derivative of each independent variable, and the value of each independent variable can be solved at this time. Therefore, the multivariate function minimization problem is transformed into a multi-dimensional nonlinear equation Group solution problem.

First, we consider the Taylor expansion of multivariate functions. Without losing generality, F1 (x1,x2,..., xn) As an example, the Taylor expansion at the point (T1,t2,..., TN) is as follows:

Take the linear part instead of F1 (x) and make it 0, which is:

By organizing it into vector form and separating out the variables, you can get: (For simplicity, the following uses F1 instead of F1 (T1,t2,..., tn) )

It is assumed that the equations are composed of a series of equations {f1=0, f2=0, ..., fn=0}, which can be organized in matrix form:

The n*n matrix in the above formula is the Jacobian matrix (Jacobian matrix) and the précis-writers is J(F). At the same time, the argument (X1,..., xn) is recorded as X, the (T1,..., tn) is recorded as T and (F1,..., fn) is recorded as F:

After simplification, you will be able to:

By writing the solution of the equations as an iterative form, we can get the Newton method iterative formula which is applicable to the solution of the equation Group:

It can be found that although the number of iterations of Newton's method is much smaller than the gradient descent method, the inverse matrix of J(F) needs to be recalculated during each iteration. If n is a characteristic dimension, the time complexity of θ (n3) is usually required for the calculation of inverse matrices. The time complexity of the inverse matrix calculation can be reduced to θ (nlog27) using the Strassen method, or the inverse matrix can be approximated with a numerical method, but these two methods are still very slow when the feature dimension is large. Therefore, Newton's method can be quickly convergent only when the feature dimension is small. In particular, when taking n=1, the upper formula can be reduced to a Newton iterative formula derived from the 1th section of this paper for solving a single equation.

3. Using Newton method to find the extremum of a function

If you use ▽f (x) to represent the gradient vectors of the function f (x), you can get an iterative formula for finding the extremum of the function by taking it into the ordinary Newton iterative formula:

Consider the following:

Iterative formulas can be further simplified in form:

whereH(f) represents the Haisen matrix (Hessian matrix) of function f (x1,..., xn).

For specific issues, the previous article in this series requires a minimum of loss function. In addition to the gradient descent method described earlier, the Newton method described in this article can be used. The corresponding iteration formula is:

Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning Study Notes (2)--another way to find extreme values: Newton's method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support