Data preprocessing in MATLAB-normalization (Mapminmax) and normalization (MAPSTD)

Source: Internet
Author: User
Tags min
First, Mapminmax


Process matrices by mapping row minimum and maximum values to [-1 1]



This means that each row of the matrix is processed into [ -1,1] intervals, for pattern recognition or other statistics, the data should be each column is a sample, each row is the same dimension of multiple samples, that is, for a m*n matrix, the sample dimension is M, the sample number is N, a total of n column n samples.



The main invocation forms are:



1. [Y,ps] = Mapminmax (X,ymin,ymax)



2. [Y,ps] = Mapminmax (X,FP)



3. Y = Mapminmax (' Apply ', X,ps)



4. X = Mapminmax (' reverse ', y,ps)



5. Dx_dy = Mapminmax (' Dx_dy ', x,y,ps)






For the 1 and 2 invocation forms, X is the preprocessed data, ymin and Ymax are expected to be the minimum and maximum value of each line, and the FP is a struct member primarily Fp.ymin, Fp.ymax. This structure can replace the effects of ymin and ymax,1 and 2. Only the parameters are brought in different forms.



Code:


X=[2,3,4,5,6;7,8,9,10,11];
Mapminmax (x,0,1)
fp.ymin=0;
Fp.ymax=1;
Mapminmax (X,FP)


For 3, in pattern recognition or statistics, PS is a training sample of data mapping, that is, PS contains the training data of the maximum and minimum value, here x is a test sample, for the test sample, the preprocessing should be consistent with the training sample, the maximum and minimum value should be the maximum and minimum value of the training set. Assuming Y is a test sample, with a total of two test samples, the code is as follows:


X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapminmax (x,0,1);
Mapminmax (' Apply ', Y,ps)


For 4-type, the data after preprocessing is reversed to get the original data.


X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapminmax (x,0,1);
Yy=mapminmax (' Apply ', y,ps);
Mapminmax (' reverse ', yy,ps)





For 5-type, the inverse wizard number (reverse derivative) is obtained according to the given matrix X, the normalized matrix Y and the mapping PS. If the given X and Y are the matrices of the M row n columns, then the result dx_dy is an array of 1XN structures, each of which is a diagonal matrix of MXN. This usage is not commonly used and is no longer an example here. 


Second, Mapminmax principle and its realization



The mathematical formula for Mapminmax is y = (ymax-ymin) * (x-xmin)/(xmax-xmin) + ymin. If the data for a row is all the same, xmax=xmin at this point, the divisor is 0, the data is unchanged at this time.



The MATLAB implementation is:


function [Out]=mymapminmax (X,ymin,ymax)

out= (ymax-ymin). * (X-repmat (min (x,[],2), 1,size (x,2)))./repmat ((Max (x , [],2)-min (x,[],2)), 1,size (x,2)) +ymin;
Index=isnan (out);
Out (index) =x (index);

End


Note that the above code assumes that the sample in data x is a column vector. 


Third. Standardization of MAPSTD



Process matrices by mapping each row's means to 0 and deviations to 1: maps each line of the matrix to 0 mean 1 variance data.



The main invocation forms are:



1. [Y,ps] = MAPSTD (X,YMEAN,YSTD)



2. [Y,ps] = MAPSTD (X,FP)



3. Y = MAPSTD (' Apply ', X,ps)



4. X = MAPSTD (' reverse ', y,ps)



5. Dx_dy = MAPSTD (' Dx_dy ', x,y,ps)



Similar to Mapminmax, the 1 and 2 are standardized for data x, where Ymean and YSTD are expected to get the mean and variance of each row of data, and similarly, we can also use a struct containing ymean and ystd to carry in.


X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapstd (x,0,1)
fp.ymean=0;
fp.ystd=1;
[Xx,ps]=mapstd (X,FP)


3 is the pretreatment of the test data, using the mean and variance of the training data to deal with, 4 is the data reversal after preprocessing.


X=[2,3,4,5,6;7,8,9,10,11];
y=[2,3;4,5];
[Xx,ps]=mapstd (x,0,1);
YY=MAPSTD (' Apply ', y,ps);

MAPSTD (' reverse ', yy,ps)

Fourth. Realization of MAPSTD standardization


The formula is Y = (x-xmean) * (YSTD/XSTD) + Ymean. If the ystd=0 is set, or if the data for a row is all the same (XSTD =0 at this point)





function [out] = MYMAPSTD (X,YMEAN,YSTD)
out= (X-repmat (Mean (x,2), 1,size (x,2)))./repmat (STD (x,0,2), 1,size (x,2)) . *ystd+ymean;
End

Fifth. Description of functions such as mean, STD, etc.


Mean default is to sum each column, mean (x,2) is the sum of each row, the STD function by default is the unbiased estimate of the standard deviation, there are three usages, s = STD (x), s = STD (x,flag), s = STD (X,flag,dim)



Where flag is an unbiased estimate of the parameters, flag=0 is unbiased estimation, that is, the default is unbiased estimation, flag=1 is biased estimation, Dim indicates the variance of the first dimension, STD (x,0,2) for each row of X to do unbiased standard deviation estimation.








Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.