Data preprocessing in MATLAB-normalization (Mapminmax) and normalization (MAPSTD)

Last Update:2018-07-26 Source: Internet

Author: User

Tags min

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Mapminmax

Process matrices by mapping row minimum and maximum values to [-1 1]

This means that each row of the matrix is processed into [ -1,1] intervals, for pattern recognition or other statistics, the data should be each column is a sample, each row is the same dimension of multiple samples, that is, for a m*n matrix, the sample dimension is M, the sample number is N, a total of n column n samples.

The main invocation forms are:

1. [Y,ps] = Mapminmax (X,ymin,ymax)

2. [Y,ps] = Mapminmax (X,FP)

3. Y = Mapminmax (' Apply ', X,ps)

4. X = Mapminmax (' reverse ', y,ps)

5. Dx_dy = Mapminmax (' Dx_dy ', x,y,ps)

For the 1 and 2 invocation forms, X is the preprocessed data, ymin and Ymax are expected to be the minimum and maximum value of each line, and the FP is a struct member primarily Fp.ymin, Fp.ymax. This structure can replace the effects of ymin and ymax,1 and 2. Only the parameters are brought in different forms.

Code:

X=[2,3,4,5,6;7,8,9,10,11];
Mapminmax (x,0,1)
fp.ymin=0;
Fp.ymax=1;
Mapminmax (X,FP)

For 3, in pattern recognition or statistics, PS is a training sample of data mapping, that is, PS contains the training data of the maximum and minimum value, here x is a test sample, for the test sample, the preprocessing should be consistent with the training sample, the maximum and minimum value should be the maximum and minimum value of the training set. Assuming Y is a test sample, with a total of two test samples, the code is as follows:

X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapminmax (x,0,1);
Mapminmax (' Apply ', Y,ps)

For 4-type, the data after preprocessing is reversed to get the original data.

X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapminmax (x,0,1);
Yy=mapminmax (' Apply ', y,ps);
Mapminmax (' reverse ', yy,ps)

For 5-type, the inverse wizard number (reverse derivative) is obtained according to the given matrix X, the normalized matrix Y and the mapping PS. If the given X and Y are the matrices of the M row n columns, then the result dx_dy is an array of 1XN structures, each of which is a diagonal matrix of MXN. This usage is not commonly used and is no longer an example here.

Second, Mapminmax principle and its realization

The mathematical formula for Mapminmax is y = (ymax-ymin) * (x-xmin)/(xmax-xmin) + ymin. If the data for a row is all the same, xmax=xmin at this point, the divisor is 0, the data is unchanged at this time.

The MATLAB implementation is:

function [Out]=mymapminmax (X,ymin,ymax)

out= (ymax-ymin). * (X-repmat (min (x,[],2), 1,size (x,2)))./repmat ((Max (x , [],2)-min (x,[],2)), 1,size (x,2)) +ymin;
Index=isnan (out);
Out (index) =x (index);

End

Note that the above code assumes that the sample in data x is a column vector.

Third. Standardization of MAPSTD

Process matrices by mapping each row's means to 0 and deviations to 1: maps each line of the matrix to 0 mean 1 variance data.

The main invocation forms are:

1. [Y,ps] = MAPSTD (X,YMEAN,YSTD)

2. [Y,ps] = MAPSTD (X,FP)

3. Y = MAPSTD (' Apply ', X,ps)

4. X = MAPSTD (' reverse ', y,ps)

5. Dx_dy = MAPSTD (' Dx_dy ', x,y,ps)

Similar to Mapminmax, the 1 and 2 are standardized for data x, where Ymean and YSTD are expected to get the mean and variance of each row of data, and similarly, we can also use a struct containing ymean and ystd to carry in.

X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapstd (x,0,1)
fp.ymean=0;
fp.ystd=1;
[Xx,ps]=mapstd (X,FP)

3 is the pretreatment of the test data, using the mean and variance of the training data to deal with, 4 is the data reversal after preprocessing.

X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapstd (x,0,1);
YY=MAPSTD (' Apply ', y,ps);

MAPSTD (' reverse ', yy,ps)

Fourth. Realization of MAPSTD standardization

The formula is Y = (x-xmean) * (YSTD/XSTD) + Ymean. If the ystd=0 is set, or if the data for a row is all the same (XSTD =0 at this point)

function [out] = MYMAPSTD (X,YMEAN,YSTD)
out= (X-repmat (Mean (x,2), 1,size (x,2)))./repmat (STD (x,0,2), 1,size (x,2)) . *ystd+ymean;
End

Fifth. Description of functions such as mean, STD, etc.

Mean default is to sum each column, mean (x,2) is the sum of each row, the STD function by default is the unbiased estimate of the standard deviation, there are three usages, s = STD (x), s = STD (x,flag), s = STD (X,flag,dim)

Where flag is an unbiased estimate of the parameters, flag=0 is unbiased estimation, that is, the default is unbiased estimation, flag=1 is biased estimation, Dim indicates the variance of the first dimension, STD (x,0,2) for each row of X to do unbiased standard deviation estimation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More