Why is the denominator of sample variance (sample variance) n-1?

Source: Internet
Author: User

Why is the denominator of sample variance (sample variance) n-1?

(Estimator, the variance of the main question is usually evaluated by the moments method.) If you are using an ML method, please do not think more than you think, the variance of the expectations of the estimator of the same is bias, interested in the same school can use their own positive state distribution calculation. )

Ben, by definition, the estimator of variance should be this:
However, this estimator has bias because:

and (n-1)/n *σ²! = σ², so, to avoid using estimator with bias, we usually use its fixup value s²:

The answer is very clear, that is, the denominator in the sample variance calculation formula is intended to make the estimate of varianceNo biasOf Unbiased estimates (unbiased estimator) are more intuitive than biased estimates (biased estimator), although some statisticians think that it is more meaningful to make the mean square error, the MSE minimum, a problem we are not discussing here;What is not intuitive is why the denominator has to be rather than to make the estimate unbiased. I believe this is where the Lord is really confused.

To answer this question, the lazy way is to let the puzzle master to see the following equation of mathematical proof:
.
But the answer is clearly not intuitive (the textbook says that the statisticians have somehow gotten the above equation in a magical way).
Below I will provide a slightly more friendly explanation.
==================================================================
===================== the split line of the answer ===================================
==================================================================
First, we assume that the mathematical expectation of a random variable is known, but the variance is unknown. Under this condition, according to the definition of variance, we have


This will get
.

therefore isan unbiased estimate of the variance, the denominator in the note is impartial!
The results are intuitive and mathematically obvious.

Now, we consider that the mathematical expectation of random variables is unknown. At this point, we tend to have no brain directly with the sample mean to replace the above formula. What is the consequence of this? The consequence is that
if used directly as an estimate, then you will tend to underestimate the variance!
This is because:

In other words, unless it happens, we must have
,
And the right side of the inequality is the other side of the poor "correct" estimate!
This inequality illustrates why direct use can lead to an underestimation of the other side's differences.

So, without knowing the true mathematical expectation of random variables, how to estimate the variance "correctly"? The answer is to replace the denominator in the above formula, in this way the original small estimate "amplification" a little bit, we can get the other side of the correct estimate of the difference:


As for why the denominator is not a number or other, it is best to look at the true mathematical proof, because the fundamental purpose of mathematical proof is to tell people "why"; for the time being, I have no way of giving more elementary explanations.

Both the sample variance and the sample mean are random variables, both have their own distributions, and they may have their own expectations and variances. Taking the denominator n-1, the expectation of the sample variance is equal to the population variance, i.e. the sample variance of this definition is the unbiased estimate of the population variance. Simple to understand, because the variance is used to mean, so the degree of freedom is less than 1, is naturally divided by (n-1).
Can not understand the words, image a little, for the sample variance, if from the population only take one sample, namely N=1, then the sample variance formula of the numerator denominator is 0, the variance is completely indeterminate. This is a good understanding, because the sample variance is used to estimate the size of the changes between individuals in the population, only one individual, of course, can not see the size of the change. Conversely, if the denominator of the formula is not n-1 but N, the calculated variance is 0--, which is unreasonable, because you cannot see only one individual to determine the total size of the individual varies by 0.
I do not know is not clear, the detailed derivation of the relevant books have, can be consulted.

Because the sample mean is different from the actual mean value.
If the denominator is N, the variance estimated by the sample will be less than the true variance.
There are specific computational processes on the wiki:
HTTP/en.wikipedia.org/wiki/unbiased_estimator#sample_variance

Sample Variance[edit] Main article:sample Variance

The sample variance of A random variable demonstrates, aspects of estimator bias:firstly, the naive estim Ator is biased, which can being corrected by a scale factor; Second, the unbiased estimator is not optimal in terms Of mean squared error  (MSE), which can are minimized by us ing a different scale factor, resulting in a biased estimator with lower MSE than the unbiased estimator. Concretely, the naive estimator sums the squared deviations and divides by  N,  which is biased. dividing instead by  n  − 1 yields an unbiased estimator. Conversely, MSE can be minimized by dividing by a different number (depending on distribution), it is results in a bias Ed Estimator. This are always larger than  n  − 1, so the is known as A shrinkage Estimator, as it "sh Rinks "the unbiased estimator towards zero; For the normal distribution the optimal value is  n  + 1.

Suppose x1, ..., xn is independent and identically distributed (I.I.D.) Random variables with Expectation μ and variance σ2. If the sample mean and uncorrected sample variance is defined as

Then S2 is a biased estimator of σ2, because

In other words, the expected value of the Uncorrected sample variance does not equal the population variance σ2 , unless multiplied by a normalization factor. The sample mean, on the other hand, was an unbiased[1] Estimator of the population mean μ.

The reason that S2 was biased stems from the fact that the sample mean are an ordinary least squares (OLS) Estimato R for μ: Are the number that makes the sum as small as possible. That's, when any and number is plugged into this sum, the sum can be only increase. In particular, the choice gives,

and then

Note that the usual definition of sample variance is

Unbiased estimator of the population variance. This can is seen by noting the following formula, which follows from the Bienayméformula, for the term in the inequality For the expectation of the Uncorrected sample variance above:

The ratio between the biased (uncorrected) and unbiased estimates of the variance is known as Bessel ' s correction.

Why is the denominator of sample variance (sample variance) n-1?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.