Machine learning (a) The realization and process analysis of gradient descent algorithm

Last Update:2018-03-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Machine learning (a) gradient descent algorithm

Because the algorithm is best applied to practical problems to make the reader feel its true usefulness, let me first describe a practical problem (gradient descent algorithm to help solve the problem): given a specified set of data, such as the housing area and the housing price of a number of data pairs (area, Price) composition (Wunda Teacher's course is the initiation course so to cite this example), my goal is to use a learning algorithm to get a forecast of house prices and housing area between the function, and then given a new housing area, using this function to predict the price. As shown in the following:

My approach to the solution is broadly as follows:

1, I found a very small data set, there are two features x1,x2, an output y;

2. Based on my data, I assume that my predictive function is a linear function h (x):

(Why is the linear function because the distribution of my data points can predict my predictive function model, on the other hand I hope that through a not very complex function to help me understand what the gradient descent algorithm actually did, this is a linear regression problem.) ）

3, at this time my goal is to obtain the function of the parameters of the prediction function h (x), the reasonable prediction function should be the difference between the actual value of the smaller the better, so if I can make the difference between the predicted function and the actual value of the smaller the lower the value of my prediction function, so that the lower cost function J (Θ The combination of the parameters that get the minimum value can get my predictive function h (x);

4, how to find the minimum value of J (Θ), is the application of ladder descent algorithm. For each parameter, set an initial point, and then follow the following guidelines to continually update the values:

From the expression can be seen, by the derivation of the J (θ), starting from the initial point of θ, equivalent to each time in the direction of a decline to update the value, where α is learning rate, the equivalent of the derivation of the selected descent direction, learning rate determines the descent step, From the results of the algorithm you will find that different alpha will produce different results, and the appropriate step size can be the best result. By substituting the derivative result into the following:

So what we're going to do next is to iterate over each parameter until it converges to the local optimal solution:

The next step is the implementation of the algorithm, by the realization of the value of the parameter and then the prediction function is finally predicted.

This is my algorithm to implement this small piece:

This involves the calculation of some vectors, and the input X and the output Y are represented in vectors after the given dataset (this is chew!). In addition, loss is used to record the difference between the predicted function and the actual value obtained after each iteration, and gradient is used to record the gradient of each descent, so it is more intuitive to see what the algorithm is doing! It is also worth noting that the number of iterations is maxiteration, which together with α can get the optimal solution of the parameters.

This is my test data set, and I entered X1 and X2:

Here are the results I tested with different alpha values and Maxiteration values:

α= 0.05,maxiteration = 10; (because I output more auxiliary information this time, so I chose only a few iterations, to show you the process)

You will find that every time the loss value is getting bigger, the parameter values are getting more and more outrageous, and the predictions are not flattering, just 10 times as big. I tried it again. Iteration 1000 outputs only the predicted value, the other information does not output:

The result is Nan.

And then I changed my alpha value, α=0.001 iterative 10 times output some auxiliary information:

You will see clearly that the loss is getting smaller, the gradient is very reasonable and the results are quite satisfactory. Iterate 1000 times to see the results:

Well, it's acceptable, if you take a closer look. Figure, you will find that the loss is slow, which is likely to prove that your alpha value is too small, so gradually increase the alpha value and then output the result:

α= 0.005, maxiteration = 1000

α= 0.01, maxiteration = 1000

α= 0.01, maxiteration = 2000

α= 0.01, maxiteration = 5000

It is found that as the number of iterations increases, the output is no longer changed, and you should be able to prove what is called convergence!

For Daniel, this is too small pediatric, but for the novice on the road, still very happy, also feel very magical.

Finally, if you have any suggestions, I am very happy to accept, but also hope to learn together!

(AO Yes, if you also want to learn with me the algorithm, machine learning content, if you also have a fantastic idea, welcome to scan the following QR code attention to my public number "said Crazy little deaf blind"! Hee Hee ）

Note: The original dataset and test data set are from https://www.jianshu.com/p/9bf3017e2487#

Sweep
Follow the public number

Machine learning (a) The realization and process analysis of gradient descent algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning (a) The realization and process analysis of gradient descent algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning (a) The realization and process analysis of gradient descent algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support