The difference between the least squares and the gradient descent method?

Last Update:2015-07-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://www.zhihu.com/question/20822481

Same
1. Essentially the same: both methods are based on the given known data (independent & dependent variables) to the dependent variables calculated a general valuation function. The dependent variables of the given new data is then estimated.
2. The same goal: all within the framework of the known data, so that the total squared difference between the estimated value and the actual value is as small as possible (in fact, not necessarily the square), the formula of the total squared difference between the estimated value and the actual value is:
The independent variable for the group I data, the dependent variable for the group I data, is the coefficient vector.

Different
1. The implementation method and the result are different: the least squares is the direct derivation to find the global minimum , non-iterative method. The gradient descent method is an iterative method, which is first given and then adjusted to the fastest descending direction, and the local minimum is found after several iterations. The disadvantage of gradient descent method is that the convergence speed slows down to the minimum point, and the selection of the initial point is very sensitive, and the improvement is mostly in these two aspects.

Agree with the above statement and add a little bit more. Look at the problem estimates, the subject may be Learning machine learning things, so there will be this problem. But as other people have pointed out, the two approaches are not quite comparable. But I had a similar problem when I was learning. My question was, where is the difference between the matrix solution of the least squares and the gradient descent method? I reckoned the subject may be asking this question, so a little answer. If I understand wrong, ignore the following directly.

In fact, in the calculation of the amount of the two are very different, so in the face of a given problem, you can selectively choose one of two methods according to the nature of the problem.
Specifically, the matrix formula for the least squares is that a is a matrix and B is a vector. If there is a discrete data point, and the equation you want to fit is roughly the same, then A is a matrix, the data points of row I are, and B is a vector whose value is. It is also known that calculating the inverse of a matrix is quite time-consuming, and that there are numerical instability in the inversion (for example, it is almost impossible to reverse the Hilbert matrix). Thus, this method of calculation is sometimes not worth advocating.
In contrast, although there are some drawbacks to the gradient descent method, the number of iterations may be relatively high, but the relative calculation is not particularly large. Moreover, the convergence of the least squares method is guaranteed. So when it comes to big data, it's a gradient descent (which should actually be some of the better iterative approaches) that deserves to be used.

Of course, there are other uses for gradient descent, such as other extremum problems. In addition, Newton method is a good method, iterative convergence faster than gradient descent method, but the calculation cost is relatively high. The main topic is interested in the relevant information can be consulted

The goal of the least squares method is to find the minimum squared sum of errors, which correspond to two kinds: linear and nonlinear. The solution of linear least squares is closed-form, and the nonlinear least squares is not closed-form, which is usually solved by iterative method.

Iterative method, that is, at each step of the update unknown amount of approximation of the solution, can be used for a variety of problems (including least squares), such as not the least squares of errors, but the smallest cubic and.

Gradient descent is one of the iterative methods that can be used to solve least squares problems (both linear and nonlinear). Gauss-Newton method is another iterative method, which is often used to solve nonlinear least squares (to some extent, it is considered as the standard nonlinear least squares solution method).

There is also an iterative method called Levenberg-marquardt, which is used to solve nonlinear least squares problem, and combines gradient descent and Gaussian-Newton method.

So if we consider the least squares as the optimization problem, then the gradient descent is one of the solution methods, and the Gaussian-Newton method and the Levenberg-marquardt can be used to solve the nonlinear least squares in solving the linear least squares.

For details, refer to Wikipedia (Least squares, Gradient descent, Gauss-newton algorithm, Levenberg-marquardt algorithm)

The gradient descent algorithm is sensitive to local extremum, but for the linear regression problem there is only the whole extremum, there is no local extremum, so in this case, the algorithm always converges.

For the stochastic gradient descent algorithm, the convergence speed is faster than the batch gradient descent algorithm, but it has a large amplitude near the minimum value, so it may not converge to true minimum[1].

The difference between the least squares and the gradient descent method?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The difference between the least squares and the gradient descent method?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The difference between the least squares and the gradient descent method?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support