The meaning analysis of regularization and normalization

Source: Internet
Author: User

Regularization (regularization), normalization (also known as regularization/normalization, normalization) is the way in which data is preprocessed, and their purpose is to make the data more convenient for our calculations or to obtain more generalized results, but not to change the nature of the problem, Below the role of their respective do a science, if there is not correct, and correct!

Objective

It should be noted that the meanings of these nouns differ in different fields, and this refers only to the meaning used in the study of machine learning.

First, regularization (regularization)

Dr. Hangyuan Li mentioned in the statistical learning method that the three elements of statistical learning are models, strategies and algorithms, and in the field of machine learning, the "model" is the probability distribution or decision function that we want to solve.

Assuming we now ask for a logistic regression problem, the first thing we need to do is assume a function that can overwrite all possible:y=wx , where w is the parameter vector,xA vector of known samples, if used $yIThe table shows the first IASamplethe real value of thisF (X{i}) $ represents the predicted value of a sample, then our loss function can be defined as:

L(YI,F(XI))= Yi−sigm< Span id= "mathjax-span-80" class= "Mi" >oi d ( xi)

There is no need to be concerned about what this function means, it is OK to represent the error. The average loss for all samples of the model y=wx becomes "experience risk" (empirical risk) or "experience loss" (empirical loss). It is clear that the minimization of empirical risk (empirical risk minimization,erm) is the principle of solving the optimal model. To achieve this, the model will become more complex, resulting in the model being applied only to the current set of samples (i.e., over-fitting, overfitting).

In order to solve the problem of overfitting, there are usually two methods, the first is to reduce the characteristics of the sample (that is, the dimension), the second is what we have to say here "regularization" (also known as "punishment", penalty).

The general form of regularization is to add a regular term after the entire average loss function (L2 norm regularization, there are other forms of regularization, they also have different functions):

RERM=1N(∑INL(YI,F(Xi))+∑inλw2i)

Behind the $\sum{I}^{n}\lambda W{i}^2OnIs positive item , its in The larger the \lambda$, the greater the penalty granularity, equals 0 for no penalty, N for the quantity of all samples, and N for the number of parameters.

The function of the regularization function can be clearly seen from the following figure:

λ=0 , i.e. no regularization

λ=1 case, the appropriate penalty

λ=+, excessive punishment, the problem of lack of fit

As mentioned above, we have other forms of regularization, such as the regularization of the L1 paradigm, which can be used to filter parameters, which are later introduced by additional articles.

Second, normalization (normalization)

When we analyze the data, we often encounter different dimensions of the individual data, such as the linear regression problem of the price prediction of the house, we assume that the house area (square meters), the age (year) and several bedrooms (a) three factors affect the price, one of the house information is as follows:

    • Size (S): 150 sqm
    • Age (Y): 5 years

Suppose we solve this problem as a logistic regression problem y=wx , and use gradient descent to solve the optimal value of W .

In a higher efficiency gradient descent method, each descent should be as close as possible to the most advantageous, assuming that the fall distance function is:

$ $distance = \lambda \delta^* $$
where $ \delta^* table ladder modulo \lambda$ represents the step size, and if the range of two vectors is particularly large, then the images of the two vectors will appear "slender":

Our gradient in the search for the best value, because the image is "slender", so to find the vertical line, two dimensions of the larger range of differences, the slower the gradient decline, it may never converge.

In order to solve this problem, if we have all the data range is normalized in 0 to 1 of the interval (also can be 0 to 10 and other ranges, but generally 0 to 1), such as the use of the following normalization formula:

X\*I=XI−x¯ xmax−< Span id= "mathjax-span-259" class= "Msubsup" >xmin

Our image will become more "round" some:

We can clearly see that the gradient will find the best advantage more quickly.

Postscript

In fact, before writing this article, I also struggled for a long time "standardization (standardization)" This concept, finally consulted a lot of people, found that the most commonly used or normalized and regularization of the two concepts. Different occasions everyone's appellation is also different, summed up or in accordance with the English language is not ambiguous: normalization and regularization two concepts.

Reprinted from Http://sobuhu.com/ml/2012/12/29/normalization-regularization.html

The meaning analysis of regularization and normalization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.