Newton Algorithm for Machine Learning (5)

Source: Internet
Author: User

Machine Learning (5): Newton algorithm 1. Introduction to Newton Iteration Algorithm set R as the root, and use it as the initial approximation of R. DoCurve The tangent L and l equations are used to obtain the abscissa between the intersection of L and the X axis, and X1 is an approximate value of R. The point is used as the tangent of the curve, and the abscissa between the tangent and the X-axis intersection is obtained, which is called the second approximate value of R. Repeat the above process to obtain the Approximate sequence of R, which is called the secondary approximation of R. The above formula is called Newton iteration formula . Using Newton Iteration Method to Solve non-linear equations is an approximate method for Linearity of non-linear equations. Expand a neighboring area of a vertex into a Taylor series, take its linear part (the first two items of Taylor's expansion), and make it equal to 0, that is, use it as the approximate equation of the nonlinear equation, if so, the iteration is obtained. In this way, an iteration relation of the Newton iteration method is obtained.   The Newton method is applied to machine learning:

 

1. Using this method requires F to meet certain conditions and is suitable for logistic regression and generalized linear model.

2. Generally, the initialization value is 0.

2. Logistic applications

In logistic regression, we need to maximize the maximum log likelihood, that isPleaseBased on the above inference, the update rules are as follows: 

Convergence speed of the Newton method: Secondary convergence

Each iteration doubles the number of valid numbers for the solution. Assume that the current error is 0.01. After an iteration, the error is 0.001. For another iteration, the error is 0.0000001. This property is discovered only when the solution is close enough to the best quality.

3. generalization of Newton's methods

Embedding is a vector rather than a number. The general formula is:

 

Is the gradient of the target function. H is the Hessian matrix, the scale is N * n, and N is the number of features. Each element of the H represents a second derivative:

 

The significance of the above formula is to multiply the vector of a first-order derivative by the inverse of a second-order derivative matrix.

Advantage: if the number of features and the number of samples are reasonable, the number of iterations of the Newton method is much lower than that of the gradient.

Disadvantage: the Hessian matrix needs to be re-computed for each iteration. If there are many features, the H matrix computing costs a lot.

 

 

 

 

Newton Algorithm for Machine Learning (5)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.