Why covariance maximum likelihood estimation is smaller than the actual covariance E (σml) = (N-1)/n *σ__ machine learning/Pattern recognition

Source: Internet
Author: User

As we all know, given the sample point {xi,i=1,2,3 ...} on the n one-dimensional real space, assuming that the sample point obeys the single peak Gaussian distribution, the parameter expression of the maximum likelihood estimate is:

Expectations: Variance:


However, have you ever noticed that the variance definition formula we received from childhood is not the same as the maximum likelihood estimate, one denominator is n-1, and the other is N. This does not mean that the maximum likelihood estimator is inaccurate. How to measure this inaccuracy. In other words, further, the definition of variance is divided by n-1. This article will start with the last question and answer these questions step-by-step.


This article mainly takes the Zhang Xianda "modern Signal Processing" Second Edition second chapter the first section as the reference material, supplemented by some webpage material, and has marked out the reference place in the article.


1, n-1 can more accurately reflect the real world

If there is only one sample in the sample set, ask: What is the variance of the Gaussian distribution at this time? is 0 or infinity. If it's infinity, then come to the second sample point, we will be unable to predict where to fall (the variance is infinitely large and can be seen as the uniform distribution of the entire real axis); if it is 0, then the second sample point, we will be sure its value is still equal to X1 (the variance is zero, can be considered as determining the event). Obviously, the variance for infinity is more in line with the real world and more in line with our intuitive feelings. That is to say, the denominator for n-1 more responsive to the real world.

There is another intuitive understanding, but it's hard to justify it from http://blog.csdn.net/feliciafay/article/details/5878036. Here directly excerpt here, also very good understanding, the author also gives why this intuitive explanation is difficult to justify.

"The sample size is smaller than the whole, so there is a smaller likelihood of pumping some extreme data." For example, to find a bunch of people to do a sample to measure height, then the likelihood of a giant in the sample is very small, so that the result may be smaller than the actual. To make up for this, make the denominator smaller so that the actual data is more responsive. Question: This explanation is actually not very reasonable. Since you may not be able to get a tall man, you may not be able to get a short one, so if the denominator becomes smaller, it should have the same reason to be bigger. I do not think that this point of view can be explained. ”


2, the degree of freedom of interpretation

This explanation is also a widespread online interpretation, of course, also belong to a more intuitive understanding. So, in the definition of expectation, we see that the molecule has n independent variables, is divided by N in the denominator, and in the definition of variance, the mean value has limited the front N sample values only N-1 is independent, because once the N-1 sample point is known, the nth sample point can be computed with the mean value. So, the denominator in the definition of variance is divided by the N-1.

Question: Why N-1 an independent variable should be divided by N-1 in the denominator.


3. Explanation of mathematical language

First, the concept of estimating sub performance is introduced. We know that there are a variety of parameter estimation methods, such as maximum likelihood estimation, maximum posteriori estimation, Bayesian estimation, minimum mean square error estimation, and so on, how to evaluate the performance of these estimators. The concept of unbiased estimation and progressive unbiased estimation is introduced.

The so-called unbiased estimate, which reflects the fact that a parameter is estimated several times, gets multiple estimates, the average of which can be well approximated to the real value of the parameter. The rigorous mathematical definition is:

Note: The average value of an estimate is a function of the size n of the sample set.

The so-called progressive unbiased estimate, taking into account such an intuitive fact, the more samples we estimate each time, the more accurate the estimates are, so when we have enough samples on hand to evaluate the performance of an estimator, we can compare the deviations from the real value when the sample set size n tends to infinity. The rigorous mathematical definition is:

Note: Unbiased estimators must be progressive unbiased, but progressive unbiased is not necessarily unbiased.

Below, we start with the definition of unbiased estimates, proving that the definition of mean and variance (though defined, but also defined on a particular set of samples, which is still essentially an estimate of the real value) is unbiased estimation of the true value.

Edit the formula is really a lot of things, directly written on the paper, may look a little painstaking, forgive me.



It can be seen that the definition of mean and variance are unbiased estimators. In addition, we need to note that biased asymptotic unbiased estimator is not necessarily worse than unbiased estimator, which is mainly embodied in the feasibility of the actual calculation (matrix full rank, etc.), computational complexity, and so on.


4, Summary

From the above three aspects we can understand the variance in the definition of why use n-1 as the denominator, and we need to know that, although the maximum likelihood estimation is biased estimate, but because of its progressive unbiased, is a widely used parameter estimation method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.