Machine learning (a) The realization and process analysis of gradient descent algorithm

Source: Internet
Author: User

Machine learning (a) gradient descent algorithm

Because the algorithm is best applied to practical problems to make the reader feel its true usefulness, let me first describe a practical problem (gradient descent algorithm to help solve the problem): given a specified set of data, such as the housing area and the housing price of a number of data pairs (area, Price) composition (Wunda Teacher's course is the initiation course so to cite this example), my goal is to use a learning algorithm to get a forecast of house prices and housing area between the function, and then given a new housing area, using this function to predict the price. As shown in the following:

My approach to the solution is broadly as follows:

1, I found a very small data set, there are two features x1,x2, an output y;

2. Based on my data, I assume that my predictive function is a linear function h (x):

(Why is the linear function because the distribution of my data points can predict my predictive function model, on the other hand I hope that through a not very complex function to help me understand what the gradient descent algorithm actually did, this is a linear regression problem.) )

3, at this time my goal is to obtain the function of the parameters of the prediction function h (x), the reasonable prediction function should be the difference between the actual value of the smaller the better, so if I can make the difference between the predicted function and the actual value of the smaller the lower the value of my prediction function, so that the lower cost function J (Θ The combination of the parameters that get the minimum value can get my predictive function h (x);

4, how to find the minimum value of J (Θ), is the application of ladder descent algorithm. For each parameter, set an initial point, and then follow the following guidelines to continually update the values:

From the expression can be seen, by the derivation of the J (θ), starting from the initial point of θ, equivalent to each time in the direction of a decline to update the value, where α is learning rate, the equivalent of the derivation of the selected descent direction, learning rate determines the descent step, From the results of the algorithm you will find that different alpha will produce different results, and the appropriate step size can be the best result. By substituting the derivative result into the following:

So what we're going to do next is to iterate over each parameter until it converges to the local optimal solution:

The next step is the implementation of the algorithm, by the realization of the value of the parameter and then the prediction function is finally predicted.

This is my algorithm to implement this small piece:

This involves the calculation of some vectors, and the input X and the output Y are represented in vectors after the given dataset (this is chew!). In addition, loss is used to record the difference between the predicted function and the actual value obtained after each iteration, and gradient is used to record the gradient of each descent, so it is more intuitive to see what the algorithm is doing! It is also worth noting that the number of iterations is maxiteration, which together with α can get the optimal solution of the parameters.

This is my test data set, and I entered X1 and X2:

Here are the results I tested with different alpha values and Maxiteration values:

α= 0.05,maxiteration = 10; (because I output more auxiliary information this time, so I chose only a few iterations, to show you the process)

You will find that every time the loss value is getting bigger, the parameter values are getting more and more outrageous, and the predictions are not flattering, just 10 times as big. I tried it again. Iteration 1000 outputs only the predicted value, the other information does not output:

The result is Nan.

And then I changed my alpha value, α=0.001 iterative 10 times output some auxiliary information:

You will see clearly that the loss is getting smaller, the gradient is very reasonable and the results are quite satisfactory. Iterate 1000 times to see the results:

Well, it's acceptable, if you take a closer look. Figure, you will find that the loss is slow, which is likely to prove that your alpha value is too small, so gradually increase the alpha value and then output the result:

α= 0.005, maxiteration = 1000

α= 0.01, maxiteration = 1000

α= 0.01, maxiteration = 2000

α= 0.01, maxiteration = 5000

It is found that as the number of iterations increases, the output is no longer changed, and you should be able to prove what is called convergence!

For Daniel, this is too small pediatric, but for the novice on the road, still very happy, also feel very magical.

Finally, if you have any suggestions, I am very happy to accept, but also hope to learn together!

(AO Yes, if you also want to learn with me the algorithm, machine learning content, if you also have a fantastic idea, welcome to scan the following QR code attention to my public number "said Crazy little deaf blind"! Hee Hee )

Note: The original dataset and test data set are from https://www.jianshu.com/p/9bf3017e2487#

Sweep
Follow the public number

Machine learning (a) The realization and process analysis of gradient descent algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.