Theshortletdistribution Dirichlet distribution (PRML2.2.1)

Source: Internet
Author: User
Theshortletdistribution Dirichlet distribution (PRML2.2.1) Dirichlet distribution can be seen as distribution above the distribution. To understand this sentence, let's take an example: Suppose we have a dice with six sides: {1, 2, 3, 4, 5, 6 }. Now we have performed 10000 throwing experiments. The results are displayed on the six sides.

The Dirichlet Distribution (PRML 2.2.1) Dirichlet Distribution can be seen as The Distribution above The Distribution. To understand this sentence, let's take an example: Suppose we have a dice with six sides: {1, 2, 3, 4, 5, 6 }. Now we have performed 10000 throwing experiments. The results are displayed on the six sides.

The Dirichlet Distribution (PRML 2.2.1)

The Dirichlet distribution can be seen as the distribution above the distribution. To understand this sentence, let's take an example: Suppose we have a dice with six sides: {1, 2, 3, 4, 5, 6 }. Now we have performed 10000 throwing experiments. The results are as follows, if we use the ratio of the number of occurrences on each side to the total number of tests to estimate the probability of appearance on this side, we obtain the probability of appearance on the six sides, which are {0.2, 0.2, 0.2, 0.2, 0.1, respectively, 0.1 }. Now, we are not satisfied. We want to perform 10000 tests, and we throw 10000 dice each test. We want to know that this situation makes us think that the probability of appearance of the six sides of the dice is {0.2, 0.2, 0.2, 0.2, 0.1, what is the probability of 0.1} (maybe the probability of the next test is {0.1, 0.1, 0.2, 0.2, 0.2, 0.2 ). In this way, we are thinking about the distribution of probability distribution on the six sides of the dice. Such a distribution is the Dirichlet distribution.

First, use the above paragraph to give an intuitive impression, and then list some information:

The introduction of Dirichlet distribution in the wiki is complex and not basic enough. I found a CMU PPT: Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture, and found an Introduction to the Dirichlet Distribution and Related Processes from the University of Washington.

I found that the CMU ppt mentioned that Beta is the conjugate prior of Binomial, which has a feeling like this. Well, the original beta distribution is the bounded prior distribution of the two distributions, so the Dirichlet distribution is the bounded prior distribution of multiple distributions. Therefore, to view the Dirichlet distribution, we need to first understand multiple distributions. Then, to understand the relationship between Dirichlet distribution and bernuoli distribution, we need to first look at the relationship between beta distribution and Dirichlet distribution. Therefore, the two-item distribution, beta distribution, and assimilation are the key basic knowledge for understanding Dirichlet distribution. This basic knowledge is recorded here (this is described in the whole chapter of PRML2.1 ).

Next, we will officially introduce the Dirichlet distribution. First, let's talk about the parameter μs for multiple distributions. In the bernuoli distribution, the parameter μ is the probability of throwing a coin to take one side, because the state space of the bernuoli distribution is only {0, 1 }. However, in multiple distributions, μ becomes a vector because the state space has K values. μ? = (μ1 ,..., μK) T . The likelihood function form for multiple distributions is ∏ K = 1K μ mkk So, just like when we select the prior-pass function of the bernuoli distribution, the function form of the Dirichlet distribution should be as follows:

P (μ | α) ∝ ∏ k = 1K μ α k? 1 k Formula 2.37

In the above formula, Σ k μ k = 1 , α? = (α 1 ,..., α k) T Is the Dirichlet distribution parameter. Finally, 2.37 is normalized to the true Dirichlet distribution:

Dir (μ | α) = gamma (α 0) GAMMA (α 1 )... Gamma (α k) ∏ k = 1K μ α k? 1 k

Where Alpha 0 = Σ k = 1K α k . This function is a bit like the Beta distribution (when K = 2 is used, it is the Beta distribution ). It is also similar to multiple distributions. Just like the Beta distribution, the Dirichlet distribution is the parameter of its posterior multiclass distribution. μ? But μ is a vector. μ? = (μ1, μ2, μ3) The Dirichlet probability density function is used when only three values exist. The triangle in the middle figure represents a flat Simplex, and the three vertices of the triangle represent μ? = (1, 0, 0) , μ? = (0, 0) And μ? = (0, 0, 1) So any point in the middle of a triangle is μ? The vertical axis is a value μ? The probability density value on Simplex (PDF ).

For Parameters μ? When estimating, we can see that the form of the posterior = likelihood * Prior function is as follows:

P (μ | D, α) lead (D | μ) p (μ | α) lead ∏ k = 1K μ α k + mk? 1 k

We can see from this form that the posterior is also the Dirichlet distribution. Similar to the beta distribution normalization posterior method, we normalize this posterior to get:

P (μ | D, α) = Dir (μ | α + m) = gamma (alpha 0 + N) GAMMA (alpha 1 + m1 )... Gamma (α K + mK) ∏ k = 1K μ α k + mk? 1 k

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.