Pick your own self-insight:
In fact, in the calculation of the amount of the two are very different, so in the face of a given problem, you can selectively choose one of two methods according to the nature of the problem.
Specifically, the matrix formula for the least squares is that a is a matrix and B is a vector. If there is a discrete data point, and the equation you want to fit is roughly the same, then A is a matrix, the data points of row I are, and B is a vector whose value is. It is also known that calculating the inverse of a matrix is quite time-consuming, and that there are numerical instability in the inversion (for example, it is almost impossible to reverse the Hilbert matrix). Thus, this method of calculation is sometimes not worth advocating.
In contrast, although there are some drawbacks to the gradient descent method, the number of iterations may be relatively high, but the relative calculation is not particularly large. Moreover, the convergence of the least squares method is guaranteed. So when it comes to big data, it's a gradient descent (which should actually be some of the better iterative approaches) that deserves to be used.
Of course, there are other uses for gradient descent, such as other extremum problems. In addition, Newton method is a good method, iterative convergence faster than gradient descent method, but the calculation cost is relatively high.
The difference between the least squares method of machine learning and gradient descent