Multinomial distribution
The corresponding multivariate distribution can be obtained by extending the two-yuan case of two-yuan distribution to multiple variables.
First, the Bernoulli distribution is extended to the multivariate hypothesis for the discrete variable x x, there may be K K, then x x observation is represented as a vector, and satisfies ∑kk=1xk=1∑k=1kxk=1, only one dimension of the value of 1 1, the other is 0 0. So the probability mass function of x x is: P (x|μ) =πkk=1μxkk P (x|μ) =πk=1kμkxk
Μμ is also a K-K-dimensional vector. and $p (x{k}=1) =\mu_k, \sum{k=1}^k\mu_k=1$. The corresponding variance is E[X|Μ]=∑XP (x|μ) x= (μ1,..., μk) t=μe[x|μ]=∑xp (x|μ) x= (μ1,..., μk) t=μ
The corresponding likelihood function is P (d|μ) =πnn=1πkk=1μxnkk=πkk=1μ (∑nxnk) K=πkk=1μmkk P (d|μ) =πn=1nπk=1kμkxnk=πk=1kμk (∑NXNK) If the dataset D is obtained through n-N times observation. Πk=1kμkmk
The Mk=∑nxnk mk=∑nxnk indicates that n n times is observed, in which the observed value is the number of K K. It is also a sufficient estimate of the distribution. The Lagrange multiplier method is used to estimate μμ by the maximum likelihood method, taking into account the constraints of likelihood function and μμ:
∑kmklnμk+λ∑kk=1 (μk−1) ∑kmklnμk+λ∑k=1k (μk−1).
By the derivation of the upper formula can be μk=−mkλμk=−mkλ
By ∑kk=1μk=1∑k=1kμk=1, can get λ=−nλ=−n, so μk=mknμk=mkn.
The distribution of m1,m2,..., Mk m1,m2,..., mk is multinomial distribution, and the PAF is: Multi (m1,m2,..., mk|μ,n) =cm1ncm2n−m1 ... Cmkmkπkk=1μmkk Multi (m1,m2,..., mk|μ,n) =cnm1cn−m1m2 ... Cmkmkπk=1kμkmk
Dirichlet distribution
As with beta distributions, the Dirichlet distribution is also used to describe a priori distribution of μμ in a multivariate case, so it also has a conjugate property that has the same form as a likelihood function, and its PDF is: Dir (μ|a) =γ (A0) gamma (A1) ... Gamma (AK) πkk=1μak−1k Dir (Μ|a) =γ (A0) γ (A1) ... Gamma (AK) πk=1kμkak−1
A0=∑kk=1ak A0=∑k=1kak in the upper style. A A is also a vector, which is a super parameter that describes the distribution of Dirichlet.
By multiplying the likelihood probability and the prior probability, the P (μ| D,a) ∝p (d|μ) p (μ|a) ∝πkk=1μak+mk−1k p (μ| D,a) ∝p (d|μ) p (μ|a) ∝πk=1kμkak+mk−1
It is found that the posterior probability is still dirichlet distribution, so it can get P (μ| D,a) =dir (μ|a) =γ (a0+n) Γ (A1+M1) ... Gamma (AK+MK) πkk=1μak+mk−1k p (μ| D,a) =dir (μ|a) =γ (a0+n) Γ (A1+M1) ... Gamma (AK+MK) πk=1kμkak+mk−1
AK AK can be considered as a simple transcendental estimate of the number of xk=1 xk=1.
from:http://bucktoothsir.github.io/blog/2015/11/17/multinomialanddirichlet/