The machine learning algorithm needs normalization under what circumstances

Source: Internet
Author: User
Tags mathematical functions

Reprint: http://www.cnblogs.com/LBSer/p/4440590.html

Machine learning model is widely used in the Internet industry, generally do machine learning applications when most of the time is spent on feature processing, one of the key step is to attribute data normalization, why should be normalized? Wikipedia gives explanations: 1 It accelerates the gradient descent to find the optimal solution, and 2) normalization has the potential to improve accuracy. Below I simply extend the explanation under these two points.

1 Normalization why can the gradient descent method be improved to solve the optimal solution speed?
The Stanford machine Learning video makes a good explanation: HTTPS://CLASS.COURSERA.ORG/ML-003/LECTURE/21

As shown, the blue circle chart represents the contours of two features. The interval between the two features X1 and X2 in the left figure is very large, and the X1 interval is [1,5], the contours of which are very sharp. When the gradient descent method is used to find the optimal solution, it is very possible to follow the "font" route (vertical contour line), which leads to the need to iterate many times to converge;

On the right, the two primitive features are normalized, and the corresponding contours appear to be very round, which can converge quickly when the gradient is dropped.

Therefore, if the machine learning model uses the gradient descent method to find the optimal solution, normalization is often necessary, otherwise it is difficult to converge or even converge.

2 normalization with the possibility of improving accuracy
Some classifiers need to calculate the distance between samples (such as Euclidean distance), such as KNN. If a feature range is very large, then the distance calculation depends mainly on this feature, which is inconsistent with the actual situation (such as the fact that the scope of the small range of characteristics more important).

3 Types of Normalization
1) Linear Normalization


The normalization method is more suitable for the numerical comparison concentration. This method has a flaw, if Max and Min are unstable, it is easy to make the normalization results unstable, so that the subsequent use of the effect is not stable. In practical use, you can replace Max and min with empirical constant values.

2) Standard deviation normalization (Standardscale)
The processed data conforms to the standard normal distribution, that is, the mean value is 0, the standard deviation is 1, and its conversion function is:

Where μ is the mean value of all sample data, Σ is the standard deviation for all sample data.

3) Nonlinear Normalization
Often used in the data differentiation is relatively large scene, some values are large, some small. With some mathematical functions, the original values are mapped. The method includes log, exponent, tangent and so on. It is necessary to determine the curve of a nonlinear function, such as log (V, 2) or log (V, 10), depending on the distribution of the data.

In addition: it is necessary to understand which algorithm models need to be normalized before training, such as SVM needs normalization, and DT does not need normalization, which is one of the necessary abilities to enhance the application ability of the algorithm.

The machine learning algorithm needs normalization under what circumstances

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.