Mainly read this article to understand the difference between the two
Http://www.zhaokv.com/2016/01/normalization-and-standardization.html
Normalized
The original data is linearly converted to the [0,1] interval, as follows:
The minimum and maximum values are particularly susceptible to outliers, so the robustness is not good enough to fit the traditional small data scene
Standardization
The most commonly used method is z-score standardization, the data will be converted to mean 0, the standard deviation is 1, processing methods are as follows:
It is the average of the sample, is the standard deviation of the sample, suitable for the existing sample size is large enough to use.
Theoretical explanation
The basis of normalization is very simple, normalization can eliminate the influence of dimension on the result and make the different variables have comparability.
The basis of standardization is relatively complex, it indicates that the difference between the original value and the mean is a few standard deviation, is a relative value, has the effect of removing the dimension, but also brings two additional benefits: The mean value is 0, the variance is 1.
A mean value of 0 can bring a lot of traversal, for example, in the central data to do SVD decomposition is equivalent to the original data to do PCA; in machine learning, many functions, such as sigmoid, Tanh, Softmax and so on, are centered around 0 (not necessarily symmetrical), which needs further elaboration.
The benefit of the standard deviation is 1: for the distance to two points is:
Where the distance between the attribute J two points is the weight of the distance between the attributes in the total distance, note: Even if it is not possible to implement each attribute to the same degree of contribution to the final result, for a given dataset, the average value of the distance between all points is a constant.
which
It is shown that the effect of the J variable on the final overall average distance is so that all attributes will have the same contribution to the average distance of the whole dataset, and then further assume that the square of the Euclidean distance (i.e. the two-second norm)
This is the sample estimate, which means that the contribution of each variable to the distance is proportional to the variance of the variable on the dataset. If I make the standard deviation 1, each dimension is the same degree of importance in calculating the distance.
If you want each dimension to play the same role in calculating distances, you should choose normalization, and if you want to preserve the potential weight relationships reflected by the standard deviation in the original data, you should choose normalization; standardization is more suitable for noisy large data.