Machine Learning (1) gradient descent (gradient descent)

Source: Internet
Author: User
Machine Learning (1) gradient descent (gradient descent)

Inscription: Recently, I have been studying Andrew Ng's machine learning, so I have taken these notes.

 

Gradient Descent is a linear regression (linear regression). First, we will give a classic example of a house,

Area (feet2) Number of rooms Price (1000 $)
2104 3 400
1600 3 330
2400 3 369
1416 2 232
3000 4 540
... ... ..

 

The area and number of rooms in the table above are input parameters, and the price is the solution to be output. The area and number of rooms represent a feature, respectively, expressed by X. The price is represented by Y. A row in the table indicates a sample. What we need to do now is to predict the price corresponding to other areas and number of rooms based on these samples. It can be expressed in terms of time, that is, given a training set, learning the function H so that h (x) can match the Result Y.

 

You can use the following formula to represent a sample:

 

θ indicates the weight of X mapped to Y, and X indicates a feature. If X0 = 1, the above formula can be written:

X (j) and Y (j) are used to represent the J sample respectively. The purpose of our calculation is to make the calculated value infinitely close to the actual value Y, that is, the cost function can use the LMS algorithm.

To obtain the minimum value of J (θ), evaluate J (θ) and zero:

In a single feature value, J indicates the number of the coefficient (weight) in the preceding formula. The value on the right is assigned to θ J on the left to complete an iteration.

The iterations of a single feature are as follows:

The iterations of multiple features are as follows:

The above formula is the batch Gradient Descent Algorithm (batch gradient descent). When the formula converges, it exits iteration. What is convergence? That is, the values of the first two iterations do not change. Generally, a specific parameter is set. When the difference between the current two iterations is smaller than this parameter, the iteration ends. Note the following:

(1) A is the learning rate, which determines the pace of decline. If it is too small, it will be slow to find the minimum value of the function. If it is too large, it may be overshoot the minimum; (2) The obtained minimum values are also different when the initial points are different. Therefore, the gradient descent only calculates the local minimum value. (3) The closer the gradient falls, the slower the descent speed; the steps of the batch gradient descent algorithm can be summarized into the following steps: (1) determine the pace of the next step, which is known as learning rate; (2) any given initial value: θ vector, usually 0 vector (3) determines a downward direction, and goes down to the predefined pace, and updates θ vector (4) When the descending height is less than a defined value, the drop is stopped;

 

Machine Learning (1) gradient descent (gradient descent)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.