Machine learning tool scikit-learn--data preprocessing under Python

Source: Internet
Author: User
Tags normalizer

1. Data normalization (standardization or Mean removal and Variance Scaling)

The normalized scale data is 0 and has a unit variance.

 fromSklearnImportPreprocessingx= [[1.,-1., 2.],      [2., 0., 0.], [0.,1.,-1.]] X_scaled=Preprocessing.scale (X)Printx_scaled#[[0. -1.22474487 1.33630621]#[1.22474487 0. -0.26726124]#[ -1.22474487 1.22474487-1.06904497]]PrintX_scaled.mean (axis =0)PrintX_SCALED.STD (axis =0)#[0.0. 0.]#[1.1. 1.]

We can also implement this feature through the Scaler(Standardscaler 0.15 later) tool class provided by the preprocessing module:

Scaler =preprocessing. Standardscaler (). Fit (X)PrintScaler#Standardscaler (Copy=true, With_mean=true, with_std=true)PrintScaler.mean_#[1.0. 0.33333333]PrintScaler.scale_#Previous version Scaler.std_#[0.81649658 0.81649658 1.24721913]Printscaler.transform (X)#[[0. -1.22474487 1.33630621]#[1.22474487 0. -0.26726124]#[ -1.22474487 1.22474487-1.06904497]]

Note: The above code is equivalent to the following code

Scaler = preprocessing. Standardscaler (). Fit_transform (X)print  scaler#[[0         ]. -1.22474487  1.33630621]#  [1.22474487  0.         -0.26726124]#  [ -1.22474487  1.22474487-1.06904497]]print Scaler.mean ( Axis = 0)#[0.  0.  0.] Print scaler.std (axis = 0)#[1.  1.  1.]

2. Data Normalization (normalization)

Scale all the values of each sample in the dataset to between ( -1,1) .

X = [[1., -1., 2.],      [2., 0., 0.],     1., 1= preprocessing.normalize (x)  Print  x_normalized#[[0.40824829-0.40824829  0.81649658]#  [ 1.          0.          0.        ]#  [0.          0.70710678-0.70710678]]

Equivalent to:

Normalizer = preprocessing. Normalizer (). Fit (X)print  normalizer#normalizer (copy=true, norm= ' L2 ') Print normalizer.transform (X) # [ [0.40824829-0.40824829  0.81649658]#  [1.          0.          0.        ] # [0.          0.70710678-0.70710678]]

Note: The above code is equivalent to the following code

Normalizer = preprocessing. Normalizer (). Fit_transform (X)print  normalizer#[[0.40824829-0.40824829  0.81649658]#  [1.          0.          0.        ] # [0.          0.70710678-0.70710678]]

3. Binary (binarization)

You can set a threshold value (threshold) by converting the numeric data to a Boolean two-value data.

X = [[1.,-1., 2.],      [2., 0., 0.], [0.,1.,-1.]] Binarizer= preprocessing. Binarizer (). Fit (X)#The default threshold value is 0.0PrintBinarizer#Binarizer (copy=true, threshold=0.0)Printbinarizer.transform (X)#[1.0. 1.]#[1.0. 0.]#[0.1. 0.]Binarizer= preprocessing. Binarizer (threshold=1.1)#set the threshold value to 1.1Printbinarizer.transform (X)#[0.0. 1.]#[1.0. 0.]#[0.0. 0.]

4. Label preprocessing (label preprocessing)

4.1) Label binary value (label binarization)

Labelbinarizer is typically used to create a label indicator matrix through a multi-class label list.

LB = preprocessing. Labelbinarizer ()print lb.fit ([1, 2, 6, 4, 2])#labelbinarizer (neg_label=0, Pos_ Label=1, Sparse_output=false)print  lb.classes_#[1 2 4 6]print Lb.transform ([1, 6])#[[1 0 0 0]#  [0 0 0 1]]

4.2) label encoding (label encoding)

Le = preprocessing. Labelencoder ()print le.fit ([1, 2, 2, 6])#labelencoder ()print  Le.classes_#[1 2 6]print le.transform ([1, 1, 2, 6])#[0 0 1 2 ]Print le.inverse_transform ([0, 0, 1, 2])#[1 1 2 6]

can also be used for conversions of labels of non-numeric types to numeric type labels:

Le =preprocessing. Labelencoder ()PrintLe.fit (["Paris","Paris","Tokyo","Amsterdam"])#Labelencoder ()Printlist (Le.classes_)#[' Amsterdam ', ' Paris ', ' Tokyo ']PrintLe.transform (["Tokyo","Tokyo","Paris"])#[2 2 1]PrintList (Le.inverse_transform ([2, 2, 1]))#[' Tokyo ', ' Tokyo ', ' Paris ']

Machine learning tool scikit-learn--data preprocessing under Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.