1. MCMC Overview
From the name we can see that the MCMC consists of two MC, namely the Monte Carlo method (Monte Carlo Simulation, referred to as MC) and Markov chain (Markov Chain, also referred to as MC). The Monte Carlo method has been introduced, followed by a Markov chain, and a sampling algorithm combining both.
2. Markov chain
The concept of Markov chains has been mentioned in many places, and its core idea is that the probability of a state transition at a given moment depends only on its previous state.
We describe it in a mathematical definition, assuming that our sequence state is...XT?2,Xt? 1, x t , x T+1 ,. , then we are at the momentXT+1 The conditional probability of the state depends only on the moment Xt, i.e.:
since the probability of a state transition in a moment depends only on its previous state, the Markov chain model is fixed as long as we can find the conversion probabilities between any two states in the system. State transitions as shown
The state transition matrix can be represented as
At this point, we give an initial state, and then pass the transformation of the state transition matrix, and eventually converge to a stable state, as shown in the Markov chain theorem
Since the Markov chain can converge to a stationary distribution, there is an idea: if we can construct a Markov chain with a transfer matrix of p, so that the stable distribution of the Markov chain is exactly P (x), then we move from any initial state x0 along the Markov chain and get a transfer sequence x0, X1, X2,?XN, Xn+1, if the Markov chain is already converging at nth step, we get a sample of π (x) xn, xn+1? (that is, starting with the nth step convergence, followed by the same smooth distribution of x, we can set this distribution to our target sample distribution).
It can be seen from the above that the steady distribution convergence of Markov chain depends mainly on the state transition matrix, so the key is how to construct the state transition matrix, so that the final stationary distribution is the distribution we want. To do this depends mainly on the detailed stationary theorem
3, MCMC sampling and m-h sampling
In MCMC sampling, a state transfer matrix Q is first randomly, however, the matrix may not satisfy the detailed stationary theorem, and some improvements will be made at a time, as follows
The specific flow of the MCMC sampling algorithm is as follows
However, the problem of convergence is too slow for MCMC sampling, so it is improved on the basis of MCMC, which leads to m-h sampling algorithm.
The specific flow of the M-H algorithm is as follows
The M-H algorithm is also applicable in high-dimensional
Generally speaking, the m-h sampling algorithm is more widely used than the MCMC algorithm, but in the era of big data, the m-h algorithm faces two problems:
1) in high-dimensional computation is very large, the algorithm is very low efficiency, and there is the problem of rejection transfer, but also increase the computational capacity
2) due to the large feature dimension, many times it is difficult to find the joint distribution of the characteristics of the target, but it is convenient to find out the conditional probability distribution between the various features (so as to consider whether it can be sampled only if the conditional probability distribution is known).
4. Gibbs sampling
As a result, the flow of the Gibbs sampling algorithm can be obtained in two-dimensional cases as follows
And in multidimensional cases, such as an n-dimensional probability distributionπ(x1, x2, ..) . xn) , we can get a new sample by rotating the sample on n axes. For any axis that is rotated toxi , the state transition probability of Markov chains isP(XI|X1,X2,...,Xi?1, xi+1, , xn ) /span> , that is, fixed n? 1 axes, moving on one axis. The process of the Gibbs sampling algorithm is as follows in the multidimensional case
Because of the advantages of Gibbs sampling in high-dimensional features, the current MCMC sampling in our usual sense is a Gibbs sample. Of course, Gibbs sampling is evolved from m-h sampling, while Gibbs sampling requires data at least two dimensions, one-dimensional probability distribution sampling is not able to use Gibbs sampling, then m-h sampling is still established.
MCMC Algorithm of machine learning