1. Data normalization (standardization or Mean removal and Variance Scaling)
The normalized scale data is 0 and has a unit variance.
fromSklearnImportPreprocessingx= [[1.,-1., 2.], [2., 0., 0.], [0.,1.,-1.]] X_scaled=Preprocessing.scale (X)Printx_scaled#[[0. -1.22474487 1.33630621]#[1.22474487 0. -0.26726124]#[ -1.22474487 1.22474487-1.06904497]]PrintX_scaled.mean (axis =0)PrintX_SCALED.STD (axis =0)#[0.0. 0.]#[1.1. 1.]
We can also implement this feature through the Scaler(Standardscaler 0.15 later) tool class provided by the preprocessing module:
Scaler =preprocessing. Standardscaler (). Fit (X)PrintScaler#Standardscaler (Copy=true, With_mean=true, with_std=true)PrintScaler.mean_#[1.0. 0.33333333]PrintScaler.scale_#Previous version Scaler.std_#[0.81649658 0.81649658 1.24721913]Printscaler.transform (X)#[[0. -1.22474487 1.33630621]#[1.22474487 0. -0.26726124]#[ -1.22474487 1.22474487-1.06904497]]
Note: The above code is equivalent to the following code
Scaler = preprocessing. Standardscaler (). Fit_transform (X)print scaler#[[0 ]. -1.22474487 1.33630621]# [1.22474487 0. -0.26726124]# [ -1.22474487 1.22474487-1.06904497]]print Scaler.mean ( Axis = 0)#[0. 0. 0.] Print scaler.std (axis = 0)#[1. 1. 1.]
2. Data Normalization (normalization)
Scale all the values of each sample in the dataset to between ( -1,1) .
X = [[1., -1., 2.], [2., 0., 0.], 1., 1= preprocessing.normalize (x) Print x_normalized#[[0.40824829-0.40824829 0.81649658]# [ 1. 0. 0. ]# [0. 0.70710678-0.70710678]]
Equivalent to:
Normalizer = preprocessing. Normalizer (). Fit (X)print normalizer#normalizer (copy=true, norm= ' L2 ') Print normalizer.transform (X) # [ [0.40824829-0.40824829 0.81649658]# [1. 0. 0. ] # [0. 0.70710678-0.70710678]]
Note: The above code is equivalent to the following code
Normalizer = preprocessing. Normalizer (). Fit_transform (X)print normalizer#[[0.40824829-0.40824829 0.81649658]# [1. 0. 0. ] # [0. 0.70710678-0.70710678]]
3. Binary (binarization)
You can set a threshold value (threshold) by converting the numeric data to a Boolean two-value data.
X = [[1.,-1., 2.], [2., 0., 0.], [0.,1.,-1.]] Binarizer= preprocessing. Binarizer (). Fit (X)#The default threshold value is 0.0PrintBinarizer#Binarizer (copy=true, threshold=0.0)Printbinarizer.transform (X)#[1.0. 1.]#[1.0. 0.]#[0.1. 0.]Binarizer= preprocessing. Binarizer (threshold=1.1)#set the threshold value to 1.1Printbinarizer.transform (X)#[0.0. 1.]#[1.0. 0.]#[0.0. 0.]
4. Label preprocessing (label preprocessing)
4.1) Label binary value (label binarization)
Labelbinarizer is typically used to create a label indicator matrix through a multi-class label list.
LB = preprocessing. Labelbinarizer ()print lb.fit ([1, 2, 6, 4, 2])#labelbinarizer (neg_label=0, Pos_ Label=1, Sparse_output=false)print lb.classes_#[1 2 4 6]print Lb.transform ([1, 6])#[[1 0 0 0]# [0 0 0 1]]
4.2) label encoding (label encoding)
Le = preprocessing. Labelencoder ()print le.fit ([1, 2, 2, 6])#labelencoder ()print Le.classes_#[1 2 6]print le.transform ([1, 1, 2, 6])#[0 0 1 2 ]Print le.inverse_transform ([0, 0, 1, 2])#[1 1 2 6]
can also be used for conversions of labels of non-numeric types to numeric type labels:
Le =preprocessing. Labelencoder ()PrintLe.fit (["Paris","Paris","Tokyo","Amsterdam"])#Labelencoder ()Printlist (Le.classes_)#[' Amsterdam ', ' Paris ', ' Tokyo ']PrintLe.transform (["Tokyo","Tokyo","Paris"])#[2 2 1]PrintList (Le.inverse_transform ([2, 2, 1]))#[' Tokyo ', ' Tokyo ', ' Paris ']
Machine learning tool scikit-learn--data preprocessing under Python