Machine Learning (5): Newton algorithm 1. Introduction to Newton Iteration Algorithm set R as the root, and use it as the initial approximation of R. DoCurve The tangent L and l equations are used to obtain the abscissa between the intersection of L and the X axis, and X1 is an approximate value of R. The point is used as the tangent of the curve, and the abscissa between the tangent and the X-axis intersection is obtained, which is called the second approximate value of R. Repeat the above process to obtain the Approximate sequence of R, which is called the secondary approximation of R. The above formula is called
Newton iteration formula . Using Newton Iteration Method to Solve non-linear equations is an approximate method for Linearity of non-linear equations. Expand a neighboring area of a vertex into a Taylor series, take its linear part (the first two items of Taylor's expansion), and make it equal to 0, that is, use it as the approximate equation of the nonlinear equation, if so, the iteration is obtained. In this way, an iteration relation of the Newton iteration method is obtained.
The Newton method is applied to machine learning:
1. Using this method requires F to meet certain conditions and is suitable for logistic regression and generalized linear model.
2. Generally, the initialization value is 0.
2. Logistic applications
In logistic regression, we need to maximize the maximum log likelihood, that isPleaseBased on the above inference, the update rules are as follows:
Convergence speed of the Newton method: Secondary convergence
Each iteration doubles the number of valid numbers for the solution. Assume that the current error is 0.01. After an iteration, the error is 0.001. For another iteration, the error is 0.0000001. This property is discovered only when the solution is close enough to the best quality.
3. generalization of Newton's methods
Embedding is a vector rather than a number. The general formula is:
Is the gradient of the target function. H is the Hessian matrix, the scale is N * n, and N is the number of features. Each element of the H represents a second derivative:
The significance of the above formula is to multiply the vector of a first-order derivative by the inverse of a second-order derivative matrix.
Advantage: if the number of features and the number of samples are reasonable, the number of iterations of the Newton method is much lower than that of the gradient.
Disadvantage: the Hessian matrix needs to be re-computed for each iteration. If there are many features, the H matrix computing costs a lot.
Newton Algorithm for Machine Learning (5)