Mahout Series: Dirichlet distribution

Source: Internet
Author: User

Dirichlet distributions can be seen as distributions above the distribution. How to understand this sentence, we can give an example: suppose we have a dice, it has six sides, respectively, {1,2,3,4,5,6}. Now we have done 10,000 throw experiments, the results of the experiment is six surface respectively, {2000,2000,2000,2000,1000,1000} times, if the number of each side and the ratio of the total number of tests to estimate the probability of this surface, then we get six surface probability, is {0.2,0.2,0.2,0.2,0.1,0.1} respectively. Now, we are not satisfied, we want to do 10,000 trials, each test we throw dice 10,000 times. We want to know that this situation makes us think that the probability of the dice six surface probability is {0.2,0.2,0.2,0.2,0.1,0.1} is how much (maybe the probability of the next test statistics is {0.1, 0.1, 0.2, 0.2, 0.2, 0.2}). So we're thinking about the distribution of the distribution on the six side of the dice where the probability distribution occurs. And such a distribution is Dirichlet distribution.

First use the above paragraph to make a visual impression, and then list some information:

Vikiri face in the Dirichlet distribution seems to introduce a very complex, not enough foundation. I found a CMU ppt:dirichlet distribution, Dirichlet process and Dirichlet process mixture, found a University of Washington's Introduction to the Dirichlet Distribution and Related processes introduced.

Found CMU that PPT, the Beta is the conjugate prior of binomial, there is a sense of it. Well, the original beta distribution is the conjugate prior distribution of two distributions, then the Dirichlet distribution is the conjugate prior distribution of the multiple distributions. So to see Dirichlet distribution, we must first understand a number of distribution, and then, to understand the relationship between Dirichlet, we must first look at the beta distribution and Bernoulli distribution relationship. So, the two-item distribution, beta distribution, and conjugate three points are the key basics of understanding Dirichlet distribution, and this is where the basics are recorded (PRML2.1 the whole Chang).

Below formally enters the Dirichlet distribution introduction, first said this multiple distribution parameter μ. In Bernoulli's distribution, the parameter μ is the probability of tossing a coin to take a certain side, because Bernoulli's distribution of state space is only {0,1}. But in multiple distributions, because the state space has k values, the μ becomes the vector μ= (μ1, ..., μk) T. The form of the likelihood function of a polynomial distribution is ∏K=1KΜMKK, so the function form of the Dirichlet distribution should be as follows when choosing the conjugate transcendental beta function of the Bernoulli distribution:

P (μ|α) ∝∏k=1kμαk1k type 2.37

In the formula, ∑kμk=1,α= (α1, ..., αk) T is the parameter of the Dirichlet distribution. Finally, the 2.37 normalization becomes the true Dirichlet distribution:

Dir (μ|α) =γ (α0) Γ (α1) ... Gamma (αk) ∏k=1kμαk1k

Which α0=∑k=1kαk. This function is a bit like the beta distribution (Beta distributions are taken when k=2). It's a bit like a lot of distribution. Like the beta distribution, the Dirichlet distribution is the distribution of the parameter μ of the corresponding posterior polynomial distribution, except that μ is a vector, and the following figure is an example of a Dirichlet probability density function when μ= (Μ1,Μ2,Μ3) is only three values. The triangle in the middle of the diagram represents a flat simplex, the three vertices of the triangle represent μ= (1,0,0), μ= (0,1,0) and μ= (0,0,1), so any point in the middle of the triangle is a value of μ, and the longitudinal axis is this μ The probability density value (PDF) on the simplex.

For the estimation of the parameter μ, we can know that the Posteriori = likelihood * Transcendental function form is as follows:

P (μ| D,α) ∝ (d|μ) p (μ|α) ∝∏k=1kμαk+mk1k

From this form, we can see that the posterior examination is also a Dirichlet distribution. Similar to the beta distribution normalization of the posterior approach, we put this back to the test, we get:

P (μ| D,α) =dir (μ|α+m) =γ (α0+n) Γ (Α1+M1) ... Gamma (ΑK+MK) ∏k=1kμαk+mk1k

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.