Andrew ng Machine learning Note 2--Gradient descent method and least squares fitting

Source: Internet
Author: User
Tags andrew ng machine learning

Today formally began to learn the machine learning algorithm, the teacher first cited an example: a region of the house area and the price of a data set, then how to predict the price of a given housing area. What most of us can think of is to draw a scatter plot of the house area and price, and then fit the price on the area curve, then for a known housing area, you can get the predicted price on the fitted curve. The problem is to return .

To solve this problem mathematically, you must define a bunch of symbols to describe the problem, the following is the definition of the symbol :


The symbols are defined, to see What we need to solve:


First find a training sample set, provided to the learning algorithm, the algorithm will generate an output function, we use H (suppose) to represent the function, the function is to accept an input, and output an estimate of the real value (output estimates), that is, Maps the input to an estimate . Next we need to decide how to express this hypothesis , in order to facilitate the analysis, the price of the relationship between the housing area is assumed to be linear relationship , this is a linear regression problem ~


Based on the above description, we need to select the appropriate θ so that given an input x, the function h is assumed to be a good estimate of the real value y. When the number of samples is m, the problem can be expressed as:


In order to take a derivative later can be about 2, usually to the formula multiplied by a 1/2, at this time, ask the solution of the problem can be expressed in the following formula:


On how to find the θ that makes the objective function minimum, the teacher speaks two methods: least squares and gradient descent.

The gradient descent method is introduced first:


As shown in the figure, if you are standing at this point of the star of the cross, look around for a week, and then ask yourself, if only one small step, which way to go can make me the fastest downhill. Gradient descent algorithm is the way to work, the direction of walking is actually gradient direction. Until you get to the local minimum value of a function. At this point, go back to the points identified by the original Doji star, select a point on the right side of this point, and continue walking and encounter a local minimum value. --The gradient descent algorithm sometimes relies on the initial value of the parameter.

Question:

-How to look around 360 degrees a week and find the fastest way to fall.

-actually did not look around for a week, only to calculate the partial derivative of the function, because the minimum value, so the gradient direction is the opposite direction of the partial derivative . For the questions mentioned above:

Given the initial value of θ, with the initial value of θ, we can get the initial hypothetical H function (because the H function is related to the characteristic of the input sample and θ), and then we get the initial output about the input relation function, then, for each training sample, we can calculate the squared sum of the deviation of the estimate and the real value, accumulate, The current target function value is given, and secondly, the new value is assigned to Theta, which subtracts the gradient of the function at that point on the original θ, calculates the value of the target function again, iterates over the process until the target function obtains the local minimum value. The formula derivation process is described below:

First, there is only one set of training samples:


When the training sample is extended to the M group (m>1):


For vector θ, the dimension of the vector is equal to the dimension of the sample feature, and each dimension component θi can find the direction of a gradient, and we can look for a whole direction, which is the quickest direction of descent.

In the algorithm above, there are two parts that need to be iterated: one is to traverse all training samples, and the other is to traverse each dimension of the vector θ for each training sample in order to get the overall descent in the fastest direction.

Random Gradient descent Method:

You can use the following pseudo-code to represent the principle of the random gradient descent method

least squares quasi-legality:

To be able to deduce the following formula, some knowledge is required to prepare:

Formula Derivation process:


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.