Newton's method, exponential distribution family, generalized linear model-Stanford ML public Lesson Note 4

Last Update:2015-10-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Personal Summary:

1, this article is mainly proof of the main things, so the mathematical formula is relatively more, the original note author omitted some things, no and the above is very good cohesion, so beginners do not necessarily see clearly, the proposed combination of Stanford machine learning the original handout (English, did not find the full text of the Chinese translation version) to see, If the derivation of the formula is confusing, it means that you need to learn some basic math.

2, combined with the gradient descent method mentioned in the previous article, this paper proposes a faster iterative method called Newton method. The original formula (1) can be understood, a blink of an eye how to change the formula (2) it? I do not know if there is a confused friend, in fact, the author of the original writing there is a misunderstanding, in fact, the formula (2) should not be written in the form of F (theta), but L (Theta), and L (Theta) is Who? is the likelihood estimation function mentioned in the previous article:

written in logarithmic form for:

after derivation:

so the F ' (theta) in the formula (2) is actually, and F ' (theta) is the derivative of the upper equation.

3, the above mentioned Newton method iteration speed fast, and mentioned it is two times convergence, presumably many people want to ask what is two times convergence, why fast? Simply put, Newton's method takes the gradient into account, and in the second-order function it can find the fastest descent method, one step, in fact, it uses two surfaces to fit the current position of the surface, and the gradient descent is to use the plane to fit the current surface. If you don't understand, look at this diagram from the wiki,

the Red Line is the Newton method, the Green line is the gradient descent method, the popular understanding is the gradient descent belongs to the greedy algorithm, takes one step to see one step, each time is selected the current gradient maximum direction to descend, but the Newton method may consider the gradient gradient, has the global vision, it will consider you to walk the step after the gradient will become larger So it's more in line with the real optimal descent strategy. Some of the theoretical, mathematical, and convex optimization theories involved can be consulted: Why is Newton's method less iterative than the gradient descent method in solving the optimization problem? and the gradient-Newton-quasi-Newton optimization algorithm and its implementation

4, on the last article mentioned in the question, why the logical regression algorithm and the least squares of the final formula of the form is similar, this paper has shown that they belong to the exponential distribution family, and this leads to the generalized linear regression model, this part of the mathematical deduction more, The basis of mathematics is not very good can look at the original text of English handouts, really do not understand to remember a conclusion it.

5, on the Newton method, as mentioned above, there is an H (n*n, the actual (n+1) * (n+1) includes the x0 intercept items, n is the number of attributes) matrix, so n can not be too large, Newton method may be used in conjunction with the random gradient descent, first using the random gradient descent to find the best value near the point, and then , the effect will be better.

6, about the problem of multi-classification, is actually a generalization of two classification, for multi-classification problem, more use of tree model, regression tree, classification tree, etc.

Newton's method, exponential distribution family, generalized linear model-Stanford ML public Lesson Note 4

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Newton's method, exponential distribution family, generalized linear model-Stanford ML public Lesson Note 4

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Newton's method, exponential distribution family, generalized linear model-Stanford ML public Lesson Note 4

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support