The first step of correlation analysis: Determine whether the overall distribution of variables-statistics

Source: Internet
Author: User
Tags lstat
Test of normality distribution
1. The Observation Act
X is the data you want to examine.

hist (x); % frequency histogram (see whether the naked eye is symmetrical, middle more, less on both sides)
2. Observe the method Histfit (x);% normal curve fitting normplot (x);% normality test (whether discrete points are distributed on a straight line, indicating that the samples are from normal distribution, otherwise non-normal distribution)
Method 2 Derivation: {{{{{} The following methods cannot verify the normal distribution, and the TTest function is used to test a single normal population mean when the variance is unknown, assuming that the population obeys the normal distribution, rather than whether the data obeys the normal distribution. The Normfit function is used only to find the mean, variance, and confidence intervals of the data that is known to be a normal distribution. % parameter estimates [Muhat,sigmahat,muci,sigmaci]=normfit (x);%muhat mean, sigmahat variance, Muci mean 0.95 confidence interval, sigmaci variance of 0.95 confidence interval% Hypothesis Test (now in the case of unknown variance, check whether the mean is mahat) [H,sig,ci]=ttest (X,muhat); %h=0, accept the assumption that the mean =mahat% of which H is a Boolean variable, h=0 that the 0 hypothesis is not rejected, the assumption that the mean value is mahat is reasonable. If H=1 is the opposite;%ci represents a confidence interval of 0.95. If the%sig is larger than 0.5, it cannot reject the 0 hypothesis, otherwise. }}}
In the case of parameter estimation and hypothesis testing, it is usually assumed that the general obeys the normal distribution, although in many cases this assumption is reasonable, but it is necessary to examine the hypothesis when it is important to carry out a critical parameter estimation or hypothesis test, or when there is a greater suspicion of it, There are many kinds of methods to conduct the general normal test, the following is a brief introduction to the procedures provided in the MATLAB statistical Toolbox.
3. Jarque-bera Inspection
By using the skewness G1 and kurtosis G2 of normal distribution, a distribution statistic (DOF n=2) containing g1,g2 is constructed, and for the significance level, when the distribution statistic is less than the distribution number, the H0 is accepted: The general obeys the normal distribution, otherwise the H0 is rejected, that is, the overall deviation is normal distribution. This test is suitable for large samples and should be used sparingly when the sample size is small. MATLAB command: H =jbtest (x), [H,P,JBSTAT,CV] =jbtest (X,alpha).
H0: Obey the normal n (MU,SIGMA2)

4.kolmogorov-smirnov Inspection
By comparing the empirical distribution function of a sample with a given distribution function, the sample is inferred from the totality of the given distribution function. The empirical distribution function of the sample of volume n is written as FN (x), it can be obtained by the proportion of the data of the sample small to x, the given distribution function as g (x), the statistic of the structure is, that is, the maximum value of the difference between the two distribution functions, for the hypothetical H0: the general obeys the given distribution G (X), and given, Based on the limit distribution of the DN (distribution at N®¥), the statistics are determined on whether to accept the H0 quantity limit.
Because this test requires a given g (x), it is only standard normal test when used in the normal test, namely H0: The general obeys the standard distribution. MATLAB command: H =kstest (x).
H0 obeys normal n (0,1)

5.Lilliefors Test

    It will improve the Kolmogorov-smirnov test for general normal test, namely H0: The general obeys the normal distribution, which is estimated by the sample mean value and variance.

matlab Command: H =lillietest (x), [H,p,lstat,cv]=lillietest (X,alpha).
H0 obey N (MU,SIGMA2)

Description
function Lillietest
The Format H = lillietest (x)% lilliefors test for input vector X, with a significant level of 0.05.
H = lillietest (x,alpha)% perform lilliefors tests at level Alpha instead of 5%, alpha between 0.01 and 0.2.
[H,P,LSTAT,CV] = Lillietest (x,alpha)%p to accept the assumed probability value, the closer the P is to 0, the original assumption of the normal distribution can be rejected; The Lstat is the value of the test statistic, and the CV is whether to reject the threshold of the original hypothesis.
It is shown that H is the test result, if h=0, X is subject to normal distribution, and if x=1, x obeys normal distribution.
Example 4-81
>> y=chi2rnd (10,100,1);
>> [H,p,l,cv]=lillietest (Y)
h =
1
p =
0.0175
L =
0.1062
CV =
0.0886
Description h=1 The assumption that the normal distribution is rejected; p = 0.0175 indicates that the probability of obeying a normal distribution is very small; the value of the statistic L = 0.1062 is greater than the critical value of the accepted hypothesis CV = 0.0886, thus rejecting the hypothesis (Test level is 5%).
>>hist (Y)
The graph shows that the data y does not obey the normal distribution.

6. Another method is to standardize the data first: Z = Zscore (X), and then the Kolmogorov-smirnov test in 2) to verify whether it is a standard normal distribution, similar to the improvement of Method 2.

In addition, the standard GB-4882 also has a directional test (skewness test, kurtosis test, Multi direction test), no direction test (Shapior-wilk test, i.e. W test, epps-pully test) and Joint inspection method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.