This article mainly wants to explain three questions:
One is the numerical characteristics of the sample, The second is the difference between the variance of the sample and the mean of the sample , and the third is how to construct the sampling distribution.
A
For simplicity, suppose there is a general ξ~n (µ,σ2) of the normal distribution, imagine that we randomly extract N samples, ξ1, ... Ξn.
There is a sample mean and sample variance at this point.
The sample mean is well understood, not just arithmetic averages:
and the variance of the sample, according to the previous understanding of variance is not:
In fact, the sample variance is:
The difference, the denominator, is actually n-1. Of course, there are children's shoes. The degree of freedom of sample variance in statistics is n-1
Well, even if it's n-1, why is that?
This is going back to the nature of the problem, what are we sampling for? Of course it is to estimate the nature of the population by the nature of the sample.
Therefore, based on unbiased estimation, we use S2 as the sample variance, i.e. S2 satisfies E (S2) =σ2. Proof of math See below link
PS: Mathematical proof of sample variance degrees of freedom for n-1 please copy the link http://www.zhihu.com/question/20099757
Two
Okay, we got it. Sample Variance S2
At this point I'm going to ask what is the variance of the sample mean ?
Recall the original intention of our research sample: that is, with sample statistics T (ξ1, ... Ξn) to infer the distribution and numerical characteristics of the overall ξ. Where sample statistics are essentially functions of random variables.
The variance between sample variance and sample mean is:
Sample variance: It is with ξ1, ... The sum of the squared deviations of ξn divided by the sample statistics formed by n-1, although it has the same squared sum of deviations as the variance of the general meaning.
Note, however, that it is actually a sample statistic constructed from the squared sum of the deviations, which is a random variable that is constructed to estimate the total variance;
Variance of the sample mean : The sample mean is also a sample statistic, which is an unbiased estimate of the population mean. The variance of the mean of the sample is actually the variance of the random variable of the sample mean value .
Suppose there are general ξ~n (µ,σ2), ξ1, ... Ξn is a sample from the overall capacity of N, because it is a simple random sampling, the samples are independent and each is distributed with the whole.
For the normal population ξ, the distribution of its sample mean can be calculated, because the independent normal distribution is additive, the sample mean is subject to ~n (µ,σ2/n).
The variance of the sample mean from the distribution is σ2/n.
Three
Speaking of structural sampling distribution, we must first say that the three major distribution of statistics:
(1) Chi-square distribution
Defined:
Properties:
(2) T distribution
Defined:
Properties:
(3) F Distribution
Defined:
Properties:
All three distributions are closely related to the standard normal distribution.
Okay, nonsense, let's just say, construct the sample distribution.
Following the assumptions above: there is a general distribution of ξ~n (µ,σ2), imagine that we randomly extract N samples, ξ1, ... Ξn.
The sample mean is subject to ~n (µ,σ2/n) and normalized to ~n (0,1).
(1) We know that the chi-squared distribution is the form of the square sum of the standard normal distribution, in which the random variable function of the square and form of the associated sample variance exists.
We try to match the µ,σ2 into a standard normal distribution form.
Finally get that (n-1) s2/σ2~. This is the distribution of the sample variance .
(2) We will encounter the mean value Μ of the known population ξ, without knowing the total variance σ2.
At this point, we naturally cannot find the ~n (µ,σ2/n). So we construct, contrast, is to use s instead of σ to find the sample mean distribution.
Look at the shape, think of it, t distribution.
=/~t (n-1).
(3) There is also an F-distribution, which is constructed by the distribution of the sample variance ratio of the two populations of the known variance σ2
Among them, N1 and N2 were sample sizes from two of the population.
Of course, other sampling distributions can be constructed with three distributions, depending on your specific business problem.
Sampling distribution and general distribution of one mathematical random thoughts