Stanford Machine Learning Open Course Notes (4)-Normalization

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Public Course address:Https://class.coursera.org/ml-003/class/index

INSTRUCTOR:Andrew Ng

1. The problem of overfitting ( Over-fitting )

Back to the linear regression problem that we first mentioned to predict the relationship between housing prices and housing area, the simplest model is linear relationship, but in many cases the offline relationship is not applicable, we need to introduce level 2, level 3, and so on. However, there are new problems after the introduction of higher-order data. The sample data can be well fitted, but the correctness of new data cannot be guaranteed. This is probably due to the overfitting:

We can see that the linear fitting effect on the left side is very poor, there is a large deviation, the order of the Right function is obviously too high, there is a large difference, only the second-order functions in the middle are the correct results. We can also judge that the possibility of moving from the function on the right to the correct prediction is very low, while the direction of the intermediate function shows regularity, which is likely to be correct. Similarly, there is also an over-fitting problem in logistic regression, as shown in:

So how can we solve the issue of overfitting? The following methods exist:

One of the items listed above is to reduce the number of features. You can manually determine the number of remaining features orAlgorithmThe other is normalization. We can leave all the features, but let the theta parameter in the function formulaMinimum value. The purpose is to minimize the number of function items, all ThetaAll0Yes.The details are described below.

2. Cost Function ( Cost functions )

It is still a cost function that cannot be escaped. Since we mentioned aboveThetaIf the value is as small as possible, you need to modify the cost function.

As shown in, we added the higher order coefficient in the cost function.Theta3AndTheta4There are two additional items, which have a great impact on the overall cost function. When solving the problem, it is not difficult to think that the values of these two coefficients will be as close as possible.0The higher-order items in the function are also close to each other.0.We write it in a more general form. Note thatTheta0This item:

Compared with linear regression, a parameter is added later, which is the cost function after normalization. HereLambdaThis parameter is a normalized parameter, which is set by yourself.LambdaToo small value is of little significance to prevent overfitting and cannot be guaranteedThetaTake as much as possible0When the value is too large, the following problems may occur:

Although there is no overfitting, it turns into underfitting, and the prediction of the house price is as follows:

This fixed value function obviously makes no sense.

3. regularized linear regression ( Normalized Linear Regression )

The cost function has changed, and the gradient descent will naturally change,Theta0OtherThetaSubtract one more item:

If normalization equations are used for calculation without gradient descentJPairThetaThe full derivative is0, You can findTheta(The derivation process is omitted here ):

It is easy to know the matrix and full rank in the brackets, so it must be reversible.

4. regularized logistic regression ( Normalized logical Regression )

The normalization of Logistic regression is similar to that of linear regression. It is also the first to modify the cost function:

Then modify the gradient descent process:

-------------------------------------Weak split line----------------------------------

Normalization is a method used to prevent over-fitting. By adding a penalty item to the parameter, the parameter value can be as small as possible, so that there are not so many high-level function items in the function. However, the Normalization Coefficient, that is, the choice of punishment coefficient, is a problem that needs to be paid attention to in this method. This choice is similar to the coefficient in gradient descent, and it is not good if it is too large or too small.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Stanford Machine Learning Open Course Notes (4)-Normalization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Stanford Machine Learning Open Course Notes (4)-Normalization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support