Open Course Notes for Stanford Machine Learning (I)-linear regression with single variables

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Public Course address:Https://class.coursera.org/ml-003/class/index

INSTRUCTOR:Andrew Ng

1. Model Representation ( Model Creation )

Consider a question: what if we want to predict the price of a house in a given area based on the house price and area data? In fact, this is a linear regression problem. The given data is used as a training sample to train it to get a model that represents the relationship between price and area (actually a function) and then use this function for prediction. The basic process is as follows:

2. Cost Function ( Cost functions )

PS:In fact, there is no price function in this section.

Since we have made it clear that we only need to train a function, the first thing we need to do is to make assumptions about the function form. Here we can assume the simplest linear function:

The following questions change to how to findThetaValue. Since we already have some training data, although we do not know whether the data is linearly related as we suppose, we don't mind a little deviation. We just need to make the function value as close as possible to the actual value. Here let'sXArea,YThe price. A series of points are shown on a two-dimensional plane.

3. Cost Function intuition 1 ( Preliminary cost functions 1)

We have already explained howHThe error between the function value and the actual value is as small as possible. Here is a clearer description:

Cost functionsJIt represents the error mentioned above. Here we write this form for the convenience of subsequent derivation functions. Our goal is to make the cost functionJThe minimum value. Note thatJMediumThetaChanged to a variable,MIndicates the number of training samples (that is, the number of point in the coordinate system ).

4. Cost Function intuition 2 ( Preliminary cost functions 2)

InTheta0AndTheta1We can find the correspondingHFunctions (a line) and costJ(A value), when we put the two together for comparison and observation, we can clearly see that the value of substitution lies inHThe fitting between a function and a sample point is always the smallest, and there is a globally unique minimum value. Cost functionsJAn Approximate 3D representation is shown in figure:

As you can seeHFunction and sample point fitting is the best, and cost functionJThe minimum value is also obtained,Theta0AndTheta1The value can beJThe horizontal and vertical coordinates.

5. Gradient Descent ( Gradient Descent )

To makeJMinimum, our idea is to changeThetaTo change the valueJ. HereThetaThe initial value is not required. Only the changes are considered. The falling gradient here means followingJFunction gradient direction changeThetaLetJThe value is reduced. The Visualized representation is as follows:

If we know a little about calculus, we can use the mathematical formula below to express this descent process:

WhereAlphaIs the learning rate (greater0), Which can be understood as the step size for each descent, which must be set manually.

6. Gradient Descent intuition ( Initial Gradient Descent )

We can perform a simple verification of the gradient descent formula above, for example, whenThetaWhen the value is too large, the gradient is positive, and each iterationThetaDecrease,JThe value is also reduced.ThetaThe value is less than an hour.ThetaIncrease,JThe value is also reduced. Therefore, the idea of gradient descent is correct.

In contrast,AlphaThe selection is not that simple,AlphaModerate selection requirements, too large or too small are not good:

As mentioned,AlphaThe value is too small, the step size is too small, it takes many steps to reduce to the minimum value, the processing speed is too slow. WhenAlphaWhen the value is large, the step size is too large, and it will fluctuate around the minimum value, and never reach the minimum value. However, even ifAlphaIf the value is moderate, we may also fall into the local minimum, which cannot reach the global minimum.

Add that even ifAlphaThe value is fixed, and the step size will automatically decrease in the gradient descent process, so we do not need to decrease when the function approaches the minimum value.AlphaTo avoid skipping the lowest point if the step is too large.

7. Gradient Descent for Linear Regression ( Gradient Descent for Linear Regression )

For linear regression, because we already have a cost functionJIn the following format:

It looks better to write like this:

Through iteration, we can reachJThe minimum value, which is not necessarily the global minimum.

------------------------------------------ Weak split line --------------------------------------------------------------

The above is the first article about linear regression with single variables. The idea is quite clear. First, the model function form is defined based on the training data, and the cost function is obtained by calculating the error with the actual value. Then, the parameters of the model function are determined by falling the gradient of the cost function. After you have determined the parameters, we can use this model to make predictions. However, because this is linear, there is only one variable, which is not very accurate. Naturally, we should introduce multi-variables and non-linear situations. That must be more complicated.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Open Course Notes for Stanford Machine Learning (I)-linear regression with single variables

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Open Course Notes for Stanford Machine Learning (I)-linear regression with single variables

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support