Conversion: normalization and Regularization

Source: Internet
Author: User
Tags ranges

Regular Expression and normalization meaning analysis

2012-12-29

Regularization and normalization (also known as normalization/standardization, normalization) are the methods for data preprocessing, they aim to make the data easier for our computing or to obtain more general results, but they do not change the nature of the problem. Next, let's take a look at their respective functions for science popularization, if any, correct it!

Preface

Note that these terms have different meanings in different fields. Here, they only refer to the meanings used in machine learning research.

I. Regularization)

Dr. Li Hang mentioned in his statistical learning method that the three elements of statistical learning are models, strategies, and algorithms. In the field of machine learning, this "model" is the probability distribution or decision-making function to be solved.

Assume that we need a logistic regression problem. The first thing we need to do is assume a function that can overwrite all possibilities: Y = wx, where W is the parameter vector, X is the vector of a known sample. If $ Y is used{I} indicates the actual value of the I-th sample, using F (x{I}) $ indicates the predicted value of the sample, so our loss function can be defined:

L (Yi, F (XI) = Yi −sigmoid (XI)

Here, you don't need to care about what this function means. You just need to know that it represents an error. The average loss of all samples in this model Y = wx is "empirical risk" or "empirical loss ). Obviously, the principle of solving the optimal model is to minimize the empirical risk (ERM. To achieve this goal, the model settings will become more and more complex. In the end, this model is only applicable to the current sample set (that is, over-fitting, over-fitting ).

There are two methods to solve the over-fitting problem. The first is to reduce the features (dimensions) of samples, and the second is to say "Regularization" (also called "punishment"). penalty ).

The general form of regularization is to add a regular term after the entire average loss function (L2 norm regularization, there are other forms of regularization, and their functions are different ):

Rerm = 1N (Σ INL (Yi, F (XI) + Σ in λ W2i)

$ \ Sum{I} ^ {n} \ Lambda W{I} ^ 2 is the regularization item. The larger \ Lambda $ indicates the larger the penalty granularity. If it is equal to 0, it indicates no punishment, N indicates the number of all samples, and N indicates the number of parameters.

From the figure below, we can clearly see the role of regularization functions:

Lambda = 0, that is, there is no Regularization

Lambda = 1, that is, appropriate punishment

λ = 100, excessive punishment and underfitting Problems

As mentioned above, we also have other forms of regularization, such as L1 paradigm regularization, which can be used to filter parameters. This article will introduce it later.

Ii. Normalization)

When we analyze the data, we often encounter different dimensions of a single data, such as the linear regression problem of price prediction for houses, we assume that the house area (square meter), the age (year), and the number of living rooms (unit) affect the house price. The information of one house is as follows:

    • Area (s): 150 m²
    • Age (y): 5 years

Suppose we take this problem as a logistic regression problem y = wx to solve, and use gradient descent to solve the optimal value of W.

In the gradient descent method with high efficiency, each descent should be as close as possible to the best advantage. Assume that the descent distance function is:

$ Distance = \ Lambda \ Delta ^ * $
Where $ \ Delta ^ * The gradient modulo. \ Lambda $ indicates the step size. If the values of the two vectors vary greatly, the images of the two vectors are very "slim ":

When we look for the Optimal Gradient, because the image is "slim", we need to find a vertical line back and forth. The larger the difference between the two dimensions, the slower the gradient decline, and the longer it will never converge.

To solve this problem, we assume that all data ranges are normalized within the range of 0 to 1 (or other ranges, such as 0 to 10, but generally it is 0 to 1), such as using the following normalization formula:

X \ * I = xi −x limit xmax −xmin

Our images will become more "positive circles:

We can clearly see that the gradient will quickly find the best advantage.

Postscript

In fact, before writing this article, I have been entangled in the concept of "Standardization" for a long time. Finally, I have consulted many people and found that the two most common concepts are normalization and regularization. Different people have different titles on different occasions. In conclusion, there is no ambiguity in English: normalization and regularization.

Conversion: normalization and Regularization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.