Conversion: normalization and Regularization

Last Update:2014-10-11 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular Expression and normalization meaning analysis

2012-12-29

Regularization and normalization (also known as normalization/standardization, normalization) are the methods for data preprocessing, they aim to make the data easier for our computing or to obtain more general results, but they do not change the nature of the problem. Next, let's take a look at their respective functions for science popularization, if any, correct it!

Preface

Note that these terms have different meanings in different fields. Here, they only refer to the meanings used in machine learning research.

I. Regularization)

Dr. Li Hang mentioned in his statistical learning method that the three elements of statistical learning are models, strategies, and algorithms. In the field of machine learning, this "model" is the probability distribution or decision-making function to be solved.

Assume that we need a logistic regression problem. The first thing we need to do is assume a function that can overwrite all possibilities: Y = wx, where W is the parameter vector, X is the vector of a known sample. If $ Y is used{I} indicates the actual value of the I-th sample, using F (x{I}) $ indicates the predicted value of the sample, so our loss function can be defined:

L (Yi, F (XI) = Yi −sigmoid (XI)

Here, you don't need to care about what this function means. You just need to know that it represents an error. The average loss of all samples in this model Y = wx is "empirical risk" or "empirical loss ). Obviously, the principle of solving the optimal model is to minimize the empirical risk (ERM. To achieve this goal, the model settings will become more and more complex. In the end, this model is only applicable to the current sample set (that is, over-fitting, over-fitting ).

There are two methods to solve the over-fitting problem. The first is to reduce the features (dimensions) of samples, and the second is to say "Regularization" (also called "punishment"). penalty ).

The general form of regularization is to add a regular term after the entire average loss function (L2 norm regularization, there are other forms of regularization, and their functions are different ):

Rerm = 1N (Σ INL (Yi, F (XI) + Σ in λ W2i)

$ \ Sum{I} ^ {n} \ Lambda W{I} ^ 2 is the regularization item. The larger \ Lambda $ indicates the larger the penalty granularity. If it is equal to 0, it indicates no punishment, N indicates the number of all samples, and N indicates the number of parameters.

From the figure below, we can clearly see the role of regularization functions:

Lambda = 0, that is, there is no Regularization

Lambda = 1, that is, appropriate punishment

λ = 100, excessive punishment and underfitting Problems

As mentioned above, we also have other forms of regularization, such as L1 paradigm regularization, which can be used to filter parameters. This article will introduce it later.

Ii. Normalization)

When we analyze the data, we often encounter different dimensions of a single data, such as the linear regression problem of price prediction for houses, we assume that the house area (square meter), the age (year), and the number of living rooms (unit) affect the house price. The information of one house is as follows:

Area (s): 150 m²
Age (y): 5 years

Suppose we take this problem as a logistic regression problem y = wx to solve, and use gradient descent to solve the optimal value of W.

In the gradient descent method with high efficiency, each descent should be as close as possible to the best advantage. Assume that the descent distance function is:

$ Distance = \ Lambda

\ Delta ^ *

Where $

\ Delta ^ *

The gradient modulo. \ Lambda $ indicates the step size. If the values of the two vectors vary greatly, the images of the two vectors are very "slim ":

When we look for the Optimal Gradient, because the image is "slim", we need to find a vertical line back and forth. The larger the difference between the two dimensions, the slower the gradient decline, and the longer it will never converge.

To solve this problem, we assume that all data ranges are normalized within the range of 0 to 1 (or other ranges, such as 0 to 10, but generally it is 0 to 1), such as using the following normalization formula:

X \ * I = xi −x limit xmax −xmin

Our images will become more "positive circles:

We can clearly see that the gradient will quickly find the best advantage.

Postscript

In fact, before writing this article, I have been entangled in the concept of "Standardization" for a long time. Finally, I have consulted many people and found that the two most common concepts are normalization and regularization. Different people have different titles on different occasions. In conclusion, there is no ambiguity in English: normalization and regularization.

Conversion: normalization and Regularization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More