The difference between the least squares and gradient descent in machine learning

Source: Internet
Author: User

http://www.zhihu.com/question/20822481 know the user, non-paper, non-rationaleSpirit_dongdong,wildog,MT practices and others agree Agree @ Zhang Ziquan, add a little bit more. Look at the problem estimates, the subject may be Learning machine learning things, so there will be this problem. But as other people have pointed out, the two approaches are not quite comparable. But I had a similar problem when I was learning. Then my question was, What is the difference between the matrix solution of least squares and the gradient descent method?I reckoned to ask the question, so I will answer it a little bit. If I understand wrong, ignore the following directly.

In fact, in the calculation of the amount of the two are very different, so in the face of a given problem, you can selectively choose one of two methods according to the nature of the problem.
Specifically, the matrix formula for the least squares is that a is a matrix and B is a vector. If there is a discrete data point, and the equation you want to fit is roughly the same, then A is a matrix, the data points of row I are, and B is a vector whose value is. It is also known that calculating the inverse of a matrix is quite time-consuming, and that there are numerical instability in the inversion (for example, it is almost impossible to reverse the Hilbert matrix). Thus, this method of calculation is sometimes not worth advocating.
In contrast, although there are some drawbacks to the gradient descent method, the number of iterations may be relatively high, but the relative calculation is not particularly large. Moreover, the convergence of the least squares method is guaranteed. So when it comes to big data, it's a gradient descent (which should actually be some of the better iterative approaches) that deserves to be used.

Of course, there are other uses for gradient descent, such as other extremum problems. In addition, Newton method is a good method, iterative convergence faster than gradient descent method, but the calculation cost is relatively high. The main topic is interested in the relevant information can be consultededited on 2013-11-16 1 reviews thanks for sharing collection • No help • Report Agree objection, will not show your nameZhang Ziquan, Ph.D candidate in mathematical financeWang Mumu, Chong Yue, Yang Tingqi and other people agree Same
1. Essentially the same: both methods are based on the given known data (independent & dependent variables) to the dependent variables calculated a general valuation function. The dependent variables of the given new data is then estimated.
2. The same goal: all within the framework of the known data, so that the total squared difference between the estimated value and the actual value is as small as possible (in fact, not necessarily the square), the formula of the total squared difference between the estimated value and the actual value is:
The independent variable for the group I data, the dependent variable for the group I data, is the coefficient vector.

Different
1. The implementation method and the result are different: the least squares is to find out the derivation directly Global Minimum, is a non-iterative method. The gradient descent method is an iterative method, which is given first and then adjusted to the fastest descending direction, after several iterations to find Local Minimum。 The disadvantage of gradient descent method is that the convergence speed slows down to the minimum point, and the selection of the initial point is very sensitive, and the improvement is mostly in these two aspects.edited on 2013-03-21 1 Reviewsendorsement 3 objection, will not show your nameUser-aware, I've been doing nothing latelySpirit_dongdong, the user, the Wang small agree The least squares is derived from linear regression, which belongs to mathematical statistics. The sample amount (n) in regression is much larger than the number of variables (m), and the least squares method is intended to solve the problem of M unknowns of N equations. The process of finding the extremum is not the focus of the least squares, the emphasis is on the equilibrium of n samples to get m equations and then to solve m unknown parameters. Besides the extremum, the precondition of linear regression determines that it is inherently only one extremum point, that is, the global minimum.

The gradient rise (descent) method is a simple Extremum method, which is used to solve all kinds of eccentric extremum solutions and belongs to the optimization algorithm. The specific @ Zhang Ziquan has been said clearly.posted on 2013-03-21 add Comment Endorsement 8 objection, will not show your nameJiang Lei, Computational Mathematics in reading a bikeTidy Honey,Ansel Ting, informed users and other people agree The steepest descent method is one of the most optimal methods to find the extremum. In addition, there are conjugate gradient method, Newton method, Quasi-Newton method (to solve the problem of the inverse cost of the sea-forest matrix) and so on.
Of course, many linear equations can also be considered as an optimization problem, because only the so-called residuals can be optimized, ax-b.
The least square method is essentially an optimization problem, or is essentially a solution to the problem of linear equations, as mentioned above, of course, this group of equations is likely to be over-determined, simply said that the number of equations is much larger than the number of unknowns, such equations in the linear algebraic sense is non-solution, But we can solve it in a wider space, such as minimizing residuals (ax-b), or least squares.
As for the numerical stability mentioned in the previous answer, convergence is another problem. I do not know whether it is clear, welcome to discuss.edited on 2013-07-10 add comment endorsement 7 objection, will not show your nameUser-aware, A follower in the Church of EmacsYang Tingqi, Fangkun,MT practices and others agree The goal of the least squares method is to find the minimum squared sum of errors, which correspond to two kinds: linear and nonlinear. The solution of linear least squares is closed-form, and the nonlinear least squares is not closed-form, which is usually solved by iterative method.

Iterative method, that is, at each step of the update unknown amount of approximation of the solution, can be used for a variety of problems (including least squares), such as not the least squares of errors, but the smallest cubic and.

The difference between the least squares and gradient descent in machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.