Random sampling method collation and explanation (MCMC, Gibbs sampling, etc.)

Source: Internet
Author: User

This article is a reference to a number of sampling on the content of the summary + handling, convenient to read their own later. In fact, the information in the material is better than I write, we can look at! Good things to share more! PRML's 11th chapter is also sampling, with time to write to PRML's notes:)

Background

Random simulations can also be called Monte Carlo Simulations (Monte Carlo Simulation). The development of this approach began in the 1940s, and was closely linked to the atomic bomb-making Manhattan Project, when several Daniel, including Ulam, von Neumann, Fermi, Feynman, Nicholas Metropolis, studied the neutron chain reaction of fissile material at the Los Alamos National Laboratory. , and began to use the statistical simulation method, and on the earliest computer programming implementation. [3]

One of the important problems in stochastic simulations is that given a probability distributionp(x) , how do we generate a sample of it in the computer. Generally evenly distributedUniform(0,1) The sample is relatively easy to generate. The pseudo-random number can be generated by the linear congruence generator, and we use deterministic algorithm to generate[0,1] After a sequence of pseudo-random numbers, the various statistical indicators and uniform distributions of these sequences Uniform(0,1) is very close to the theoretical calculation results. Such pseudo-random sequences have better statistical properties and can be used as real random numbers.

Here are a few things to summarize:

1. Monte Carlo Numerical Integration

2, uniform distribution, box-muller transformation

3. Monte Carlo principle

4. Acceptance-Refusal sampling (acceptance-rejection sampling)

5. Importance sampling (importance sampling)

6. Markov chain, Markov steady state

7. mcmc--metropolis-hasting algorithm

8. Mcmc--gibbs Sampling algorithm

1. Monte Carlo Numerical Integration

If we ask for the integral of f (x), such as

While the form of f (x) is not good for complex integrals, the approximate result can be obtained by numerical method. The common method is Monte Carlo integration:

The Q (x) is considered to be the probability distribution of x in the interval, and the previous fractional department as a function, then the N sample is extracted under Q (x), and when n is large enough, it can be approximated by using the mean value:

So as long as Q (x) is relatively easy to pick up the data sample. The core of stochastic simulation method is how to get a sample of a probability distribution, i.e. sampling (sampling). Below we will introduce the common sampling methods.

2, uniform distribution, box-muller transformation

Generating a sequence of pseudo-random numbers between [0,1] in a computer can be seen as a homogeneous distribution. and the random number generation method has many, the simplest is as follows:

Of course, computer generated random numbers are pseudo-random numbers, but generally enough.

[Box-muller Transform] if the random variable U1,u2 independent and u1,u2∼uniform[0,1],

The Z0,Z1 is independent and follows the standard normal distribution.

3. Monte Carlo principle

Monte Carlo Sampling calculates that the expected value of the variable is the focus of the following: X is the immediate variable, subject to the probability distribution P (x), so to calculate the expectation of f (x), we just keep sampling the XI from P (x), and then we can approximate f (x) by averaging these F (xi).

4. Acceptance-Refusal sampling (acceptance-rejection sampling) [2]

In many practical problems, p (x) is difficult to sample directly, so we need to ask for other means of sampling. Since P (x) is too complex to be sampled directly in the program, I set a program to sample the distribution Q (x) such as the Gaussian distribution, and then follow certain methods to reject certain samples, reaching the goal of near P (x) distribution, where Q (x) is called proposal distribution.

To do this, set a convenient sampling function q (x), and a constant k, so that P (x) is always below KQ (x). Reference

    • X-axis direction: sampled from Q (x) distribution to get a. (if it's Gaussian, use the tricky and faster algorithm that you said earlier)
    • Y-axis direction: from the uniform distribution (0, KQ (a)) sampling to get U.
    • If it just falls to the gray area: U > P (a), reject, otherwise accept this sample
    • Repeat the above process

In the case of high-dimensional, rejection sampling will appear two problems, the first is that the appropriate Q distribution is difficult to find, the second is difficult to determine a reasonable k value. These two problems lead to a high rejection rate and an increase in useless computing.

5. Importance sampling (importance sampling) [2]

Importance sampling is also the use of easy sampling distribution Q (proposal distribution) to solve this problem, directly from the formula:

where P (z)/q (z) can be regarded as importance weight. Let's take a look at the above equation, p and F are OK, we're going to be sure about Q. What kind of distribution does it take to make the sample more effective? The intuitive feeling is that the smaller the variance of the sample, the faster the expected convergence rate. For example, a sample is 0, a sampling is 1000, the average is 500, so the sampling effect is very poor, if the sampling is 499, a sample is 501, you say the expectation is 500, the credibility is relatively high. In the above formula, we aim to pxf/q the smaller variance, the better, so |pxf| Big Place, proposal distribution Q (z) should also be large. To give a slightly extreme example:

The first graph represents the P distribution, the shadow area of the second figure is F = 1, and the non-shaded area f = 0, then a good Q distribution should have a high probability of distribution in the area indicated by the left arrow, since the sampling calculations in other regions are effectively invalid. This indicates that importance sampling may be more effective than sampling with the original P distribution.

Unfortunately, finding a suitable q in a high-dimensional space is very difficult. Even with the advent of Adaptive importance sampling and sampling-importance-resampling (SIR), to find one that satisfies easy to sample at the same time and good approximations Proposal distribution, it is often impossible!

6. Markov chain, Markov steady state

Before the Monte Carlo method, the Markov chain must be first spoken, and the mathematical definition of the martensitic chain:

That is, the previous state is only related to the current state, and is independent of other states, Markov Chain embodies the transformation of the state space, the next state is only determined with the current state (can be linked to the web crawler principle, according to the current page hyperlink access to the next page). Such as:

For example, if the current state is U (x) = (0.5, 0.2, 0.3), then the state of the next matrix is U (x) T = (0.18, 0.64, 0.18), which has been converted according to this transformation matrix, and the final system is approaching a stable state (0.22, 0.41, 0.37) (only two valid digits are retained here). And it turns out that no matter where you go from that point, it comes together after a long Markov Chain. [2]

For one more example, sociologists often divide people into 3 categories according to their economic conditions: the lower level (Lower-class), the Middle (middle-class), and the upper (Upper-class), which we use to represent these three classes respectively. Sociologists have found that the most important factor in determining a person's income class is the income class of their parents. If a person's income belongs to the lower class, then the probability of his child belonging to the lower income is 0.65, the probability of middle income is 0.28, the probability of belonging to the upper income is 0.07. In fact, from the parent to the descendant, the transition probability of the change of income stratum is as follows

Using the matrix representation, the transfer probability matrix is recorded as

We find that from the 7th generation, the distribution is stable, in fact, in this problem, from the beginning of any initial probability distribution will converge to this stable result.

Note: The requirements diagram is Unicom (no outliers), and there is no one unicom sub-graph is not outside the outer edge (like a black hole).

The convergence theorem of this Markov chain is very important, and all MCMC (Markov Chain Monte Carlo) methods are based on this theorem.

For a given probability distribution P (x), we want to have a convenient way to generate its corresponding sample. Since the Markov chain can converge to a smooth distribution, so a very beautiful idea is: if we can construct a transfer matrix of P's Markov chain, so that the stable distribution of the Markov chain is exactly P (x), then we start from any initial state x0 along the Markov chain transfer, to obtain a transfer sequence X0,X1,X2,?XN , xn+1?, if the Markov chain is already converging at nth step, then we get a sample of π (x) xn,xn+1?

This brilliant idea was Metropolis in 1953, in order to study the stationary nature of the particle system, Metropolis considered the sampling problem of the common Boltzmann distribution in physics, and first proposed the Monte Carlo method based on the Markov chain, namely the Metropolis algorithm, and implemented programmatically on the earliest computers. Metropolis algorithm is the first universal sampling method, and inspired a series of MCMC methods, so people regard it as the starting point of the random simulation technology takeoff. Metropolis's paper was included in the major breakthrough in statistics, and the Metropolis algorithm was selected as one of the 10 most important algorithms of the 20th century.

The MCMC algorithm we introduce next is an improved variant of the Metropolis algorithm, the commonly used metropolis-hastings algorithm. From the example and theorem of the previous section, we can see that the convergence property of Markov chain is mainly determined by the transfer matrix p, so the key problem based on Markov chain sampling is how to construct the transfer matrix p, so that the stationary distribution is exactly the distribution P (x) we want. How can this be done? We mainly use the following theorem.

Markov chain transfer and acceptance probability

Assuming we already have a transfer matrix Q (the corresponding element is Q (i,j)), the above process is collated, and we get the following algorithm for sampling probability distribution P (x).

8. Mcmc--gibbs Sampling algorithm

Construction of plane Markov chain transfer matrix

The above algorithm converges, the probability distribution P (x1,x2,?, xn) samples, of course, these samples are not independent, but what we are asking here is that the sampled sample matches the given probability distribution, does not require independence. Similarly, in the above algorithm, the axis rotation sampling is not necessary, can introduce the randomness in the axis rotation, when the transfer matrix Q in the transfer probability of any two points will include the probability of the axis selection, and in the usual Gibbs sampling algorithm, the axis rotation is a deterministic process, That is, at a given moment T, the probability of a transfer on a fixed axis is 1.

Resources

[1] http://blog.csdn.net/xianlingmao/article/details/7768833

[2] Http://www.cnblogs.com/daniel-D/p/3388724.html

[3] http://cos.name/2013/01/lda-math-mcmc-and-gibbs-sampling/

[4] An Introduction-MCMC for machine learning,2003

[5] Introduction to Monte Carlo Methods

Random sampling method collation and explanation (MCMC, Gibbs sampling, etc.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.