Stanford Machine Learning Open Course Notes (4)-Normalization

Source: Internet
Author: User


Public Course address:Https://class.coursera.org/ml-003/class/index 

INSTRUCTOR:Andrew Ng

1. The problem of overfitting ( Over-fitting )

Back to the linear regression problem that we first mentioned to predict the relationship between housing prices and housing area, the simplest model is linear relationship, but in many cases the offline relationship is not applicable, we need to introduce level 2, level 3, and so on. However, there are new problems after the introduction of higher-order data. The sample data can be well fitted, but the correctness of new data cannot be guaranteed. This is probably due to the overfitting:

We can see that the linear fitting effect on the left side is very poor, there is a large deviation, the order of the Right function is obviously too high, there is a large difference, only the second-order functions in the middle are the correct results. We can also judge that the possibility of moving from the function on the right to the correct prediction is very low, while the direction of the intermediate function shows regularity, which is likely to be correct. Similarly, there is also an over-fitting problem in logistic regression, as shown in:

So how can we solve the issue of overfitting? The following methods exist:

One of the items listed above is to reduce the number of features. You can manually determine the number of remaining features orAlgorithmThe other is normalization. We can leave all the features, but let the theta parameter in the function formulaMinimum value. The purpose is to minimize the number of function items, all ThetaAll0Yes.The details are described below.

2. Cost Function ( Cost functions )

It is still a cost function that cannot be escaped. Since we mentioned aboveThetaIf the value is as small as possible, you need to modify the cost function.


As shown in, we added the higher order coefficient in the cost function.Theta3AndTheta4There are two additional items, which have a great impact on the overall cost function. When solving the problem, it is not difficult to think that the values of these two coefficients will be as close as possible.0The higher-order items in the function are also close to each other.0.We write it in a more general form. Note thatTheta0This item:


Compared with linear regression, a parameter is added later, which is the cost function after normalization. HereLambdaThis parameter is a normalized parameter, which is set by yourself.LambdaToo small value is of little significance to prevent overfitting and cannot be guaranteedThetaTake as much as possible0When the value is too large, the following problems may occur:


Although there is no overfitting, it turns into underfitting, and the prediction of the house price is as follows:

This fixed value function obviously makes no sense.

3. regularized linear regression ( Normalized Linear Regression )

The cost function has changed, and the gradient descent will naturally change,Theta0OtherThetaSubtract one more item:


If normalization equations are used for calculation without gradient descentJPairThetaThe full derivative is0, You can findTheta(The derivation process is omitted here ):


It is easy to know the matrix and full rank in the brackets, so it must be reversible.

4. regularized logistic regression ( Normalized logical Regression )

The normalization of Logistic regression is similar to that of linear regression. It is also the first to modify the cost function:

Then modify the gradient descent process:

-------------------------------------Weak split line----------------------------------

Normalization is a method used to prevent over-fitting. By adding a penalty item to the parameter, the parameter value can be as small as possible, so that there are not so many high-level function items in the function. However, the Normalization Coefficient, that is, the choice of punishment coefficient, is a problem that needs to be paid attention to in this method. This choice is similar to the coefficient in gradient descent, and it is not good if it is too large or too small.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.