Multivariate variables (multinomial Variables)
Binary variables are used to describe the amount of only two possible values, and when we encounter a discrete variable, it can have k possible states. We can use a k-dimensional vector x representation, where only one-dimensional xk is 1 and the remainder is 0. The parameter corresponding to Xk=1 is Μk, which indicates the probability of xk occurring. Its distribution can be seen as a generalization of the Bernoulli distribution.
Now we consider n independent observation d={x1,..., XN} and get its likelihood function.
Polynomial distribution (the multinomial distribution)
Now we consider the joint distribution of K-variables, which is dependent on the parameter μ and N-observations, which makes up the polynomial distribution.
Dirichlet distribution (the DIrichlet distribution)
For the sake of convenience, if the prior distribution and likelihood function have similar structure, so that the resulting posterior distribution is just the addition of the exponential power of the parameters, but the form is not much change, so that the prior and posterior distribution has the same form, simplifying the calculation.
The following is a graph of the Dirichlet distributions of three variables, where the left figure {αk}=0.1, middle diagram {αk}=1, right {αk}=10:
Maximum posteriori estimate
reprint Please indicate the author Jason Ding and its provenance
GitHub home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
"Mathematics in machine learning" polynomial distribution and its conjugate distribution