1. The centrality of data
The so-called centralization of data refers to the data in the dataset minus the mean of the data set.
For example, there are datasets 1, 2, 3, 6, 3, and its mean value is 3, then the data set after the 1-3,2-3,3-3,6-3,3-3 is a: -2,-1,0,3,0
2. Standardization of data
Standardization of the so-called data refers to the normalized data divided by the standard deviation of the dataset, that is, the data in the dataset minus the mean of the dataset, divided by the standard deviation of the dataset.
For example, Datasets 1, 2, 3, 6, 3, with a mean of 3, and a standard deviation of 1.87, the normalized data set is (1-3)/1.87, (2-3)/1.87, (3-3)/1.87, (6-3)/1.87, (3-3)/1.87, i.e.: -1.069,- 0.535,0,1.604,0
The meaning of data center and standardization is the same, in order to eliminate the influence of dimension on data structure.
The scale method can be used to center and standardize data in the R language:
#限定输出小数点后数字的位数为3位 > Options (digits=3) > Data <-C (1, 2, 3, 6, 3) #数据中心化 > scale (data, center=t,scale=f) [, 1] [1,] -2[2,] -1[3,] 0[4,] 3[5,] 0attr (, "Scaled:center") [1] 3# data Normalization > scale [, 1][1,] -1.06904[2,] -0.53452[3,] 0.00000[4,] 1.60357[5,] 0.00000attr (, "Scaled:center") [1] 3attr (, "Scaled:scale") [1] 1.8708
The two parameters in the scale method are explained in center and scale:
1.center and scale defaults to true, that is, t or True
2.center for true presentation data centric
3.scale for true presentation data normalization
Standardization and centrality of data and scale explanation in R language (RPM)