`特征归一化`

There are a number of different names, such as:

`特征缩放`

，

`Feature Normalization`

，

`Feature Scaling`

Data Standardization (normalization) processing is a basic work of data mining, the different evaluation indicators often have different dimensions and dimensional units, this situation will affect the results of data analysis, in order to eliminate the dimensional impact between the indicators, data standardization needs to be processed to solve the comparability of data indicators. After the raw data has been standardized, the indexes are in the same order of magnitude, which is suitable for comprehensive contrast evaluation.

`特征归一化`

The meaning

- The size range of each feature is consistent to use algorithms such as distance measurement
- Convergence of accelerated gradient descent algorithm
- In the SVM algorithm, the consistent feature can accelerate the search for support vectors time
- Different machine learning algorithms, acceptable input value range is not the same

Here are two common normalization methods:

- Min-max Normalization (Min-max normalization) linear normalization

Known as dispersion normalization, is a linear transformation of the original data, which maps the resulting value between [0 1]. The conversion functions are as follows:

`x^*=\frac{x-min}{max-min}`

The method achieves equal scaling of the original data, in which the normalized data is the maximum value of the sample data for the original data, and the `x^*`

`x`

`max`

`min`

Minimum value for the sample data.

Disadvantages:

- When new data is added, it can lead to changes in Max and Min, which need to be redefined.
- Data is unstable, outliers and more noise are present

Advantages:

- When we need to attribute the eigenvalues to a range [A, b], select Minmaxscaler

- 0 mean value Standardization (z-score standardization)

This method standardizes the data by giving the mean value (mean) and standard deviation of the original data (deviation). The original data set is normalized to a data set with an average of 0 and a variance of 1, and the conversion function is:

`x^*=\frac{x-\mu}{\delta}`

`μ`

`δ`

the mean values and methods of the original data set, respectively. This normalization requires the distribution of the original data to approximate the Gaussian distribution, otherwise the normalized effect becomes worse.

Advantages:

- The maximum and minimum values that apply to the data are unknown, or there are outliers.

- Comparison

These are two common but commonly used normalization techniques, and what about the two normalization scenarios? When is the first method better, and when is the second method better? Here is a brief summary of the analysis:

- In the classification, clustering algorithm, the need to use distance to measure similarity, or use PCA technology for dimensionality reduction, the second method (Z-score

standardization) performed better.
- The first method or other normalization method can be used when the distance metric, covariance calculation, and data non-conforming distribution are not involved. In the comparison process, the RGB image is converted to a grayscale image and its value is limited to [0

255] range.

Below with

`Python`

To achieve the above

For example: Suppose there are 4 samples and their characteristics are as follows

Sample | Feature 1 | Feature 2

---|---|---

1 | 10001 | 2

2 | 16020 | 4

3 | 12008 | 6

4 | 13131 | 8

Before normalization is visible, the size of feature 1 and feature 2 is not an order of magnitude. Once normalized, the feature becomes

Sample |
Feature 1 |
feature 2 |

1 |
0 |
0 |

2 |
1 |
0.33 |

3 |
0.73 |
0.67 |

4 |
0.81 |
1 |

Min-max Normalization (Min-max normalization) linear normalization

Sklearn.preprocessing.MinMaxScaler

In `sklearn`

, `sklearn.preprocessing.MinMaxScaler`

is a method for feature normalization. Use examples such as the following

`from sklearn.preprocessing import MinMaxScalerx = [[10001,2],[16020,4],[12008,6],[13131,8]]min_max_scaler = MinMaxScaler()X_train_minmax = min_max_scaler.fit_transform(x)#归一化后的结果X_train_minmaxarray([[ 0. , 0. ], [ 1. , 0.33333333], [ 0.33344409, 0.66666667], [ 0.52001994, 1. ]])`

It defaults to the value of each feature normalized to [0,1], and the normalized value size range is adjustable (adjusted according to the Minmaxscaler parameter feature_range). The following code allows the normalization of features between [ -1,1].

`min_max_scaler = MinMaxScaler(feature_range=(-1,1))X_train_minmax = min_max_scaler.fit_transform(x)#归一化后的结果X_train_minmaxarray([[-1. , -1. ], [ 1. , -0.33333333], [ 0.46574339, 0.33333333], [ 0.6152873 , 1. ]])`

The implementation formula for Minmaxscaler is as follows

`X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))X_scaled = X_std * (max - min) + min`

This is an expression of vectorization, stating that X is a matrix, where

- X_STD: Normalization of X to [0,1]
- X.min (axis=0) indicates the minimum column value
- Max,min represents
`MinMaxScaler`

the parameter `feature_range`

parameter. That is, the size range of the final result

The following example shows the calculation process (max=1,min=0)

Sample | Feature 1 | Feature 2

---|---|---

1 | 10001 | 2

2 | 16020 | 4

3 | 12008 | 6

4 | 13131 | 8

X.max | 16020 | 8

X.min | 10001 | 2

The normalization process is as follows, assuming that the normalized matrix is s

- s11= (10001-10001)/(16020-10001) =0
- s21= (16020-10001)/(16020-10001) =1
- s31= (12008-10001)/(16020-10001) =0.333444
- s41= (13131-10001)/(16020-10001) =0.52002
- S12= (2-2)/(8-2) =0
- S22= (4-2)/(8-2) =0.33
- S32= (6-2)/(8-2) =0.6667
- S42= (8-2)/(8-2) =1

The results are consistent with the calculations in section "Minmaxscaler use".

Standardscaler Standardized method 0 Normalization of the mean value

Sklearn.preprocessing.StandardScaler

Sklearn.preprocessing.robust_scale

In `sklearn`

, `sklearn.preprocessing.StandardScaler`

is a method for feature normalization. Use examples such as the following

`from sklearn.preprocessing import StandardScalerx = [[10001,2],[16020,4],[12008,6],[13131,8]]X_scaler = StandardScaler()X_train = X_scaler.fit_transform(x)X_trainarray([[-1.2817325 , -1.34164079], [ 1.48440157, -0.4472136 ], [-0.35938143, 0.4472136 ], [ 0.15671236, 1.34164079]])`

After normalization, the average value of each column of the matrix is 0, and the standard deviation is 1. Note that the standard deviation here refers to the standard deviation after the Delta Degrees of Freedom Factor, which differs from the conventional standard deviation formula. (In NumPy, there is the STD () function to calculate the standard deviation)

The normalization of the standardscaler is to subtract the column mean from each feature and divide it by the standard deviation of the column.

The following example shows the calculation process, noting that the standard deviation is calculated using NP.STD ().

Sample |
Feature 1 |
feature 2 |

1 |
10001 |
2 |

2 |
16020 |
4 |

3 |
12008 |
6 |

4 |
13131 |
8 |

Column mean value |
12790 |
5 |

Column Standard deviation |
2175.96 |
2.236 |

The normalization process is as follows, assuming that the normalized matrix is s

- s11= (10001-12790)/2175.96=-1.28173
- s21= (16020-12790)/2175.96=1.484
- s31= (12008-12790)/2175.96=-0.35938
- s41= (13131-12790)/2175.96=0.1567
- S12= (2-5)/2.236=-1.342
- S22= (4-5)/2.236=-0.447
- S32= (6-5)/2.236=0.447
- S42= (8-5)/2.236=1.3416

2.1 Some understandings of the normalization of features