The importance of data distribution patterns
In the process of data analysis, the different distribution patterns of data will directly affect the choice of data analysis strategy. Therefore, it is very important to judge the distribution pattern of the data series. The common distribution pattern of data is normal distribution, random distribution (evenly distributed), Poisson distribution, exponential distribution, etc., but in data analysis, the most important distribution pattern is normal, and many data analysis techniques are fixed-distance variable or high measure variable for normal distribution.
Below we introduce the three kinds of normal test methods commonly used by SPSS.
Data distribution strategy of SPSS judgment
Histogram with normal curves
Using the SPSS menu Analysis Environment: "Analysis"-"descriptive statistics"-"frequency", in the drawing options to select a histogram with a normal curve.
The histogram of the normal curve is plotted by comparing the fitting degree between the histogram and the normal curve to determine whether the distribution of the data sequence is close to the normal distribution. The following two pictures are a class of Chinese and mathematical results, with a normal curve histogram. On the graph, the normal curve closest to the current data sequence is shown. It can be seen from the graph that Chinese scores are close to normal curves, while the distribution of mathematical results is far from normal curve. Based on the fitting degree of histogram and its close normal curve, it can be judged whether the data sequence conforms to normal distribution.
Q-q and P-p graphs
Using SPSS Menu Analysis Environment: "Analysis"-"descriptive Statistics"-"p-p or q-q diagram".
Using Q-q graph and p-p graph to judge whether the data sequence is near the normal distribution, the P-p diagram and the q-q diagram are the same, the difference is that the units of the transverse ordinate are different, the P is the cumulative ratio, q is the number of points, and the following is illustrated by the example of the p-p chart. From the two picture on the left, the Chinese p-p graph, the scatter point can match with the diagonal line, then the data sequence conforms to the normal distribution, but the mathematical scatter point deviates from the diagonal line, the data sequence does not conform to the normal distribution.
The two graph on the right becomes the inverse trend normal probability graph, taking the cumulative probability as the horizontal axis, with the deviation of the standard normal distribution as the ordinate. Therefore, the standard normal distribution is the middle horizontal line. Although both graphs have many scattered points on both sides of the horizontal line, but the Chinese ordinate units between the -0.06~0.06, and the unit of mathematics between the -0.3~0.3, relative to the cumulative probability of 1, the deviation of the language score is very small, can be considered to be basically in line with the normal distribution.
K-s Normal state test
Using the SPSS menu Analysis Environment: "Analysis"-"descriptive Statistics"-"1 sample K-s".
The single variable k-s test of SPSS is used to determine whether the data sequence is near normal distribution. Using histogram/q-q graph/p-p graph to judge the normality of data sequence, mainly through the subjective judgment of the analyst. and using k-s as the normal test is to judge whether the sequence satisfies the normal distribution by comparing the difference between the data sequence and the standard normal distribution. The following table is using SPSS as the result of k-s normal test, can be in the last line of the Chinese p value of 0.2, greater than 0.05, indicating that there is no significant difference between Chinese performance and normal distribution, and mathematics for 0.000, less than 0.05, can be thought that the mathematical results and normal distribution by significant difference.
As a single, this Kolmogorov-smirnov
Chinese
Mathematical
N
40
40
The a,b of the normal parameters
Average numbers
69.9825
78.0874
Standard deviation
5.15620
12.84712
At the extreme end of the difference
Absolutely
.103
.280
Is
.069
.219
Negative
-.103
-.280
Measuring data
.103
.280
It's almost obvious (both tails)
.200c,d
.000c
A. The distribution is permanent.
B. From data calculations.
C. lilliefors a correction.
D. This is the lower limit of true.