MCMC Algorithm of machine learning

Source: Internet
Author: User
Tags ord

1. MCMC Overview

From the name we can see that the MCMC consists of two MC, namely the Monte Carlo method (Monte Carlo Simulation, referred to as MC) and Markov chain (Markov Chain, also referred to as MC). The Monte Carlo method has been introduced, followed by a Markov chain, and a sampling algorithm combining both.

2. Markov chain

The concept of Markov chains has been mentioned in many places, and its core idea is that the probability of a state transition at a given moment depends only on its previous state.

We describe it in a mathematical definition, assuming that our sequence state is...XT?2,Xt? 1, x t ,  x T+1 ,. , then we are at the momentXT+1 The conditional probability of the state depends only on the moment Xt, i.e.:

  

since the probability of a state transition in a moment depends only on its previous state, the Markov chain model is fixed as long as we can find the conversion probabilities between any two states in the system. State transitions as shown

  

The state transition matrix can be represented as

  

At this point, we give an initial state, and then pass the transformation of the state transition matrix, and eventually converge to a stable state, as shown in the Markov chain theorem

  

Since the Markov chain can converge to a stationary distribution, there is an idea: if we can construct a Markov chain with a transfer matrix of p, so that the stable distribution of the Markov chain is exactly P (x), then we move from any initial state x0 along the Markov chain and get a transfer sequence x0, X1, X2,?XN, Xn+1, if the Markov chain is already converging at nth step, we get a sample of π (x) xn, xn+1? (that is, starting with the nth step convergence, followed by the same smooth distribution of x, we can set this distribution to our target sample distribution).

It can be seen from the above that the steady distribution convergence of Markov chain depends mainly on the state transition matrix, so the key is how to construct the state transition matrix, so that the final stationary distribution is the distribution we want. To do this depends mainly on the detailed stationary theorem

  

3, MCMC sampling and m-h sampling

In MCMC sampling, a state transfer matrix Q is first randomly, however, the matrix may not satisfy the detailed stationary theorem, and some improvements will be made at a time, as follows

  

The specific flow of the MCMC sampling algorithm is as follows

  

However, the problem of convergence is too slow for MCMC sampling, so it is improved on the basis of MCMC, which leads to m-h sampling algorithm.

  

The specific flow of the M-H algorithm is as follows

  

The M-H algorithm is also applicable in high-dimensional

  

Generally speaking, the m-h sampling algorithm is more widely used than the MCMC algorithm, but in the era of big data, the m-h algorithm faces two problems:

1) in high-dimensional computation is very large, the algorithm is very low efficiency, and there is the problem of rejection transfer, but also increase the computational capacity

2) due to the large feature dimension, many times it is difficult to find the joint distribution of the characteristics of the target, but it is convenient to find out the conditional probability distribution between the various features (so as to consider whether it can be sampled only if the conditional probability distribution is known).

4. Gibbs sampling

  

  

As a result, the flow of the Gibbs sampling algorithm can be obtained in two-dimensional cases as follows

  

And in multidimensional cases, such as an n-dimensional probability distributionπ(x1, x2, ..) . xn) , we can get a new sample by rotating the sample on n axes. For any axis that is rotated toxi , the state transition probability of Markov chains isP(XI|X1,X2,...,Xi?1, xi+1,  , xn ) /span> , that is, fixed n? 1 axes, moving on one axis. The process of the Gibbs sampling algorithm is as follows in the multidimensional case

  

Because of the advantages of Gibbs sampling in high-dimensional features, the current MCMC sampling in our usual sense is a Gibbs sample. Of course, Gibbs sampling is evolved from m-h sampling, while Gibbs sampling requires data at least two dimensions, one-dimensional probability distribution sampling is not able to use Gibbs sampling, then m-h sampling is still established.

MCMC Algorithm of machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.