We all know that {Xi, I =, 3 ...}, assume that the sample points are subject to a single-peak Gaussian distribution, the maximum likelihood estimation parameter expression is:

Expected: variance:

However, have you noticed that the variance definition formula we have received since childhood is different from the maximum likelihood estimation. One denominator is n-1 and the other is N. Does this mean that the maximum likelihood estimation is inaccurate? How can this inaccuracy be measured? From another angle, let's go further. Why is the formula defined by variance divided by n-1? This article will begin with the last question and answer these questions step by step.

This article mainly uses Chapter 2 and section 1 of modern signal processing as references, supplemented by webpage materials, and marks references in the article.

**1. n-1 can reflect the real world more accurately**

If there is only one sample in the sample set, question: what is the variance of Gaussian distribution at this time? Is it 0 or infinity? If it is infinite, then the second sample point will not be able to predict where it falls (the variance is infinite, which can be seen as the even distribution of the entire real number axis); if it is 0, in the second sample point, we will be sure that its value is still equal to x1 (the variance is zero, which can be seen as a confirmation event ). Obviously, an infinite variance is more in line with the real world and our intuitive feelings. That is to say, if the denominator is n-1, it can better reflect the real world.

There is another intuitive understanding, but it is difficult to justify it, from http://blog.csdn.net/feliciafay/article/details/5878036. This is a well-understood excerpt. The author also explains why this intuitive explanation is difficult to justify.

"The sample size is smaller than the overall capacity, so there is a small possibility of extracting some extreme data. For example, if you find a bunch of people to make a sample to measure the height, then the possibility of giants in the sample is very small, so the result may be smaller than the actual. In order to make up for this deficiency, the denominator will be smaller, so as to better reflect the actual data. Question: This explanation is not reasonable. Since we may not be able to find a tall person or a short child, we should have the same reason to make the denominator smaller. I don't think the problem can be explained ."

**2. Interpretation of degrees of freedom**

This explanation is also a common explanation on the Internet. Of course, it is also a relatively intuitive understanding. In this way, we can see in the expected definition that the numerator has n independent variables, which are divided by N on the denominator, while in the definition of variance, the mean has limited the previous n sample values only N-1 is independent, because once you know the N-1 sample points, can be combined with the mean to calculate the N sample points. Therefore, the denominator in the definition of variance is divided by the N-1.

Question: Why should a N-1 with an independent variable be divided by the N-1 on the denominator?

**3. Mathematical language explanation**

First, introduce the concept of estimation sub-performance evaluation. We know that there are various parameter estimation methods, including maximum likelihood estimation, Maximum Posterior Estimation, Bayesian estimation, and least mean square error estimation. How can we evaluate the performance of these estimates? This introduces the concepts of unbiased estimation and progressive unbiased estimation.

The so-called Unbiased Estimation reflects the fact that multiple estimates of a parameter are obtained to obtain multiple estimates. The average values of these estimates can well approximate the real values of the parameter. Rigorous mathematical definition:

Note: The mean value of the estimated value is the function of the sample set size N.

The so-called incremental unbiased estimation takes into account the fact that the more samples each time, the more accurate the estimation. Therefore, when we have enough samples at hand, when evaluating the performance of an estimator, we can compare the deviation between them and the actual value when the sample set size N tends to be infinite. Rigorous mathematical definition:

Note: The unbiased estimator must be gradual and unbiased, but the progressive and unbiased estimator is not necessarily unbiased.

Next, we start from the definition of unbiased estimation and prove the definition of mean and variance (although it is a definition, it is also defined in a specific sample set, in essence, it is still an estimation of the real value) is an unbiased estimation of the real value.

Editing a formula is really a waste of effort. It may seem a little effort-consuming to write it on paper. Sorry, sorry.

It can be seen that the mean and variance definitions are unbiased estimator. In addition, we need to note that the biased incremental unbiased estimator is not necessarily worse than the unbiased estimator, which is mainly reflected in the feasibility of actual calculation (full rank of matrix) and computing complexity.

**4. Summary**

From the above three aspects, we can understand why N-1 is used as the denominator in the definition of variance. At the same time, we need to know that although the maximum likelihood estimation is biased, however, because of its gradual and unbiased nature, it is a widely used parameter estimation method.

2012-06-12

Xjs.xjtu@gmail.com