Normalization and standardization in data specification:

Normalization and standardization in data specification: _matlab

Last Update:2018-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A. Normalization vs. standardization

Normalization: Limit the amount of data you need to be processed (through some sort of algorithm) to a range you need. First normalization is for the convenience of the back data processing, followed by the maintenance of the program running faster convergence. Generally means to limit the data between [0 1].

The number of the number into (0,1) between, mainly for data processing to facilitate the proposed, the data map to 0-1 processing, more portable fast;

The dimensional expression into dimensionless expression, become a pure quantity;

In general, the maximum-minimum normalization is used for linear transformation of the original data: x*= (x-xmin)/(Xmax-xmin)

Standardization: Scale the raw data and limit it to a certain range. General correction means that the mean value is 0 and the variance is 1. This method can be used even if the data does not conform to the normal distribution, and the normalized data is negative.

Because of the different measure units of the credit index system, in order to be able to participate in the evaluation calculation, it is necessary to normalize the indexes and map them to a numerical range by function transformation.

Data and chemotaxis processing: To solve the data problems of different nature, the direct addition of different property indicators can not correctly reflect the comprehensive results of different forces, we must first consider changing the nature of the inverse index data, so that all indicators of the evaluation scheme of the force of the same, and then add the total to get the correct results

"Dimensionless treatment: To solve the comparability of data;

Generally adopts z-score normalization: that is, the mean value is 0, the variance is normal distribution of 1;

In MATLAB, there are three kinds of methods for normalization:
(1) Premnmx, Postmnmx, Tramnmx. Premnmx refers to the return of one to [1 1],tramnmx is the change test set input results, POSTMNMX is the conversion test set output results.
(2) PRESTD, POSTSTD, trastd. The PRESTD is normalized to the unit variance and the 0 mean value.

(3) programming by oneself. About self programming is generally grouped into [0.1 0.9]
B. Why should I use normalization? Singular sample data refers to a particular large or very small sample vector relative to other input samples. The network training time caused by the singular sample data is increased, and the network can not converge, so the data set with the singular sample data of the training sample is normalized before the training, and if there is no singular sample data, no prior normalization is needed.

C. Return one can also use Mapminmax.

This function can return each row of the matrix to [a b]. The default is [-1 1].
[Y1,ps] = Mapminmax (x1,a,b). Where X1 is a matrix that needs to be normalized, y1 is the result
When the need for another set of data back to the moment, such as training data in SVM with the above method, and test data can be used to do the same in the following method: y2 = Mapminmax (' Apply ', X2,ps)
When you need to restore the data that is returned, you can use the following command: X1_again = Mapminmax (' reverse ', y1,ps)

D.matlab Command Description

1. Mean: Calculating vector mean value. Mean (x,1) column vector mean value, mean (x,2) line vector mean. Mean2 (x) matrix mean value.

2. std: calculated vector mean variance, STD (x,0,1) column vector mean Variance, STD (x,0,2) row vector mean variance. STD2 (x) matrix mean variance

3. var: compute vector Variance, var (x)

4. SSE: Error squared and SSE (x). The closer to 0, the better the fitting and the more successful the data prediction.

5. MSE: Mean variance squared sum, MSE (x) =sse (x)/N. Meaning with SSE

6. R-square: Determine the coefficient. The coefficient of determination is to characterize a fitting by the change of data. By the expression above, we can know that the normal range of "definite coefficients" is [0 1], and the closer to 1, the greater the explanatory power of the variable of the equation to the Y, and the better the data fitting of the model.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Normalization and standardization in data specification: _matlab

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Normalization and standardization in data specification: _matlab

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support