Machine learning Public Lesson Note the Nineth week of the big data gradient descent algorithm

Source: Internet
Author: User

One, random gradient descent method (Stochastic Gradient descent)

When the training set is large and the normal gradient descent method (Batch Gradient descent) is used, because each time \ (\theta\) is updated, the differential term is calculated by iterating over all the data of the training set, so the speed is slow

Batch gradient descent method is a one-time calculation of the M-Group of data differential, one update \ (\theta\), the calculation of the differential m-group data, using the same \ (\theta\), will get the global minimum value

The stochastic gradient descent method computes the differential of M-group data in sequence, M-Update \ (\theta\), calculates the differential of M-group data, uses the last set of arrays to update the \ (\theta\), and obtains a local minimum value very close to the global minimum.

General Iteration 1-10 Times

Two, small batch gradient descent method (Mini-batch Gradient descent)

Comparison of three gradient descent methods

The low-volume gradient descent method is a one-time update B (typically 10,2~100d) group data, update \ (\lceil \frac{m}{b} \rceil\), between the random gradient descent method and the batch gradient descent method

The low-volume gradient descent method is faster than the random gradient descent method because the frequency of the update \ (\theta\) is faster than the random gradient descent method because it is possible to accelerate the quantization operation when the differential is computed (that is, matrix multiplication).

Third, verifying the convergence of the cost function

Calculate \ (Cost (\theta, (x^{(i)}, y^{(i)}) before each update \ (\theta\))

Because the random gradient descent method updates \ (\theta\) Every time, there is no guarantee that the cost function \ (costs (\theta, (x^{(i)}, y^{(i))) will be smaller, only the overall oscillator is smaller, so we only need the last 1000 data \ (Cost (\theta , (x^{(i)}, y^{(i)})) the average

The above two pairs of graphs are relatively normal random gradient descent graph, lower left need to improve the sample number (1000->5000) and see if convergence, lower right significantly monotonically increment, choose a smaller learning rate \ (\alpha\) or change the characteristics of the test

We can also dynamically modify the learning rate to converge the cost function, decreasing as the number of iterations increases

\ (\alpha = \frac{const1}{iterationnumber + const2}\)

Four, online learning

Online learning is in the absence of pre-prepared data sets, there is a real-time data stream to give learning model, real-time update \ (\theta\), the advantages

1, no need to save large amount of local data

2, real-time changes according to the characteristics of the data \ (\theta\)

In fact, similar to the random gradient descent method

Online learning other examples, can be based on user search keyword characteristics, to real-time learning feedback results, in accordance with the user's click to update \ (\theta\), such as

Machine learning Public Lesson Note the Nineth week of the big data gradient descent algorithm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.