Basic Idea of Random Simulation and common sampling methods (sampling)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Generally, we encounter many problems that cannot be solved accurately by means of analysis. For example, the formula is special and cannot be solved;

In this case, people often use some methods to obtain an approximate solution (the closer the exact solution, the better, of course, if the closeness between an approximate algorithm and the exact solution can be measured by a formula or the upper and lower bounds, this approximate algorithm is better, because people can know the closeness level, generally, after an approximation algorithm is proposed, people usually evaluate or seek to characterize the approximation formula ).

The Random Simulation mentioned in this article is a kind of approximate solution. This method is awesome, although its birth can be traced back to the injection problem of French mathematician PU Song In 18xx years (using a simulated method to solve the \ PI problem ), however, the real large-scale application was also used to solve the difficult problems encountered by the United States during the Second World War to produce atomic bombs, and the Monte Carlo method was put forward.

This article will be divided into two categories to describe separately. First, we will talk about the basic idea and idea of random simulation, and then we will examine the core of Random Simulation: sampling a distribution. We will introduce common sampling methods. accept-reject sampling; 2) Sampling of importance; 3) MCMC (Markov Chain Monte Carlo method, this article mainly introduces two well-known Sampling Algorithms of MCMC (Metropolis-Hasting Algorithm and its special case of the box-based Gaussian Sampling Algorithm ).

I. Basic Idea of Random Simulation

Let's first look at an example: Now suppose we have a rectangular area R (the size is known) and there is an irregular area M in this area (that is, it cannot be calculated directly by the formula ), what is the size of M required? How can this problem be solved? There are many similar methods, such as dividing the irregular area M into many small rule areas and using the area sum of these rule areas to approximate M, another approximate method is the sampling method. We grab a handful of soy beans and place them evenly in the rectangular area. If we know the total number of soy beans s, as long as the number of soy beans in the irregular area M is S1, we can find the m area: M = S1 * R/S.

In another example, in the field of machine learning or statistical computing, we often encounter the following problem: how to obtain a fixed point: \ INF _ A ^ B f (x) dx, such as normalization factor.

How can we solve such problems? Of course, if the given points can be parsed and obtained directly, if not, you can only obtain the approximate solution. The common approximate method is to use Monte Carlo points, that is, rewrite the formula:

\ INF _ A ^ B f (x) * g (x)/g (x) dx = \ INF _ A ^ B (1/g (x )) * f (x) * g (x) dx

Then, f (x)/g (x) is used as a function, while G (x) is treated as a probability distribution on [a, B]. After N samples are extracted, the formula above can be further written as: {\ sum _ 1 ^ N [F (x_ I)/g (x_ I)]}/n. When n tends to be infinite, according to the theorem of big numbers, the substatement is equal to the required definite integral substatement. Therefore, an approximate solution can be obtained using the sampling method.

Through the above two examples, we can probably understand the basic idea of the sampling method to solve the problem. The basic idea is to convert the problem to be solved into a problem that can be solved through a certain sampling method, as for how to transform, it is still very creative and there is no way to decide. Therefore, the core of the random simulation method is how to obtain a sample (sampling) for a probability distribution ). Therefore, in the next section, we will introduce common sampling methods.

(PKU, sewm, shinning)

Ii. Common Sampling Methods

2.0 Direct Sampling

Because it is relatively simple and can only solve very simple problems, it is generally a one-dimensional distribution problem.

2.1 accept-reject sampling (acceptance-rejection sampling)

This is also called "reject sampling", which is intuitive to understand. In order to obtain a distributed sample, we have obtained many preliminary samples through some mechanism, then, some of the preliminary samples will be taken as valid samples (that is, the distribution of the samples to be extracted), and some preliminary samples will be considered invalid samples to be discarded. The basic idea of this algorithm is: we need to sample a distributed f (x), but it is difficult to directly sample it, therefore, we want to use another easily sampled g (x) distribution sample to remove some samples by some mechanism, so that the remaining samples are derived from and requested distribution f (x).

It has several conditions: 1) for any X, f (x) <= m * g (x); 2) g (x) is easy to sample; 3) g (x) is better to be close to f (x) in shape ). The specific sampling process is as follows:

1. Sample g (x) to obtain a sample Xi, Xi ~ G (x );

2. For the uniform distribution sampling UI ~ U (A, B );

3. if the UI <= f (x)/[M * g (x)], Xi is considered as a valid sample; otherwise, this sample is discarded; (# This step fully reflects the method name: Accept-reject)

4. Repeat steps 1 ~ 3, until the required samples meet the requirements.

This method can be:

(Note: This figure is obtained from other places. It is not drawn by yourself. Some symbols are inconsistent with those in the text. \ Pi (x) is the F (x) in the text ), Q (x) is the G (x) in the text ))

2.2 Importance Sampling)

Importance Sampling is closely related to Monte Carlo points. Let's look at the points:

\ INF f (x) dx = \ INF f (x) * (1/g (x) * g (x) dx. If G (x) is a probability distribution, N samples are extracted from G (x). The formula is approximately equal to (1/n) * \ sum F (xi) * (1/g (XI )). This is equivalent to assigning a weight to each sample. A large value of G (xi) indicates a high probability. Therefore, N contains many samples with Xi, that is, these samples have a high weight, so it is called importance sampling.

The sampling procedure is as follows:

1. Select an easy-to-sample distribution g (x) and extract n samples from G (x;
2. Calculate (1/n) * \ sum F (xi) * (1/g (XI) to obtain the approximate solution.

(PKU, sewm, shinning)

2.3 MCMC Sampling Method

Both sampling rejection and sampling importance belong to independent sampling, that is, the sample is independent from the sample. Such sampling efficiency is relatively low, for example, sampling rejection, most of the extracted samples are invalid, so the efficiency is relatively low. The MCMC method is associated with sampling, that is, the next sample is related to this sample, so that the sampling efficiency is high. The basic idea of the MCMC method is to buildMarkov ChainMakeStable DistributionIs the distribution f (x) We want to sample ). If this Markov chain reaches a stable state, each sample from this chain is a sample of f (x), so as to achieve the purpose of sampling. There is a core issue here. How can we build a qualified Markov chain? (What is a Markov chain and stable distribution? For details, we suppose the reader knows this .)
In this article, we will introduce two famous MCMC sampling methods, which can easily construct a Markov Chain meeting the requirements.

A). Metropolis-Hasting Algorithm

This algorithm is co-called by two authors, but not from the same paper. One is in, and the other is in, which expands the work in, therefore, the algorithm is named by the two authors.

Assume that the probability distribution to be sampled is \ Pi (x), and now assume that there is a probability distribution P (Y | X), so that \ Pi (x) * P (Y | X) = \ Pi (y) * p (x | Y) is established and called a careful balance formula. This careful balance formula is necessary for a Markov chain to achieve stable distribution.[Reporter Note: it should be a sufficient and non-essential condition]. Therefore, the key is to construct a probability distribution P (Y | x) so that it can meet the meticulous balance. Now suppose we have an easily sampled distribution Q (Y | X) (called suggested distribution). For the current sample X, it can pass Q (Y | X) obtain the next suggestion sample Y. The suggestion sample y is accepted or not accepted according to a certain probability, which is called ratio \ alpha (x, y) = min {1, Q (x | y) * \ Pi (y)/[q (Y | x) * \ Pi (x)]}. That is, if you know sample Xi, how do you know what the next sample X _ {I + 1} is? Q (Y | xi) is used to obtain a recommended sample y, and then according to \ alpha (XI, Y) determines whether X _ {I + 1} = Y or X _ {I + 1} = xi. It can be proved that the distribution Q (Y | x) * \ alpha (x, y) meets the meticulous balance, and the samples obtained from this extraction are samples with the distribution \ Pi (X. The procedure is as follows:

1. Given a starting sample x_0 and a suggested distribution Q (Y | X );

2. for the I-th sample Xi, Use Q (Y | xi) to obtain a suggested sample y; calculation ratio \ alpha (XI, y) = min {1, Q (XI | Y) * \ Pi (y)/[q (Y | xi) * \ Pi (xi)]};

3. Extract A uniform distribution sample UI ~ U (). If UI <= \ alpha (XI, Y), X _ {I + 1} = y; otherwise, X _ {I + 1} = XI;

4. Repeat Step 2 ~ 3. Until the desired number of samples is extracted.

If the distribution Q (Y | X) is recommended to meet the following requirements: Q (Y | X) = Q (x | Y), that is, symmetric. At this time, the ratio \ alpha (x, y) = min {1, \ Pi (y)/\ Pi (x)} is the original algorithm in 1953. Later, hasting extended the algorithm, and distributed symmetry is not required, the above algorithm is obtained. However, this algorithm has a disadvantage: the sampling efficiency is not high, and some samples will be discarded. Thus, the Gaussian algorithm is generated.

B). Gaussian Sampling Algorithm

It is very simple to use the conditional distribution sampling to replace the full probability distribution sampling. For example, if X = {x1, x2,... xn} satisfies the distribution of p (x), how can we sample p (x? If we know its conditional distribution P (X1 | X _ {-1 }),..., P (XI | X _ {-I }),...., X _ {-I} indicates all the variables of X except Xi. If these conditional distributions are easily sampled, We can sample the total probability distribution p (x) by sampling the conditional distributions.

The following describes how to use the Gaussian Sampling Algorithm:

1. Specify an initial sample X0 = {x10, x20,..., xn0}

2. A sample xi = {x1i, x2i,..., xni} is known, and X1 _ {I + 1} is sampled. X1 _ {I + 1 }~ P (X1 | Xi _ {-1 })

3. Sample X2 _ {I + 1}, and X2 _ {I + 1 }~ P (X2 | X1 _ {I + 1}, x2i,... xni)

........................................ ........................

4. Sample XN _ {I + 1}, XN _ {I + 1 }~ P (XN | X1 _ {I + 1}, X2 _ {I + 1},... X _ {n-1} _ {I + 1 })

5. Step 2 ~ 4. You can obtain a sample of x and repeat Step 2 ~ 4. The samples of X can be continuously obtained.

Of course, either the Metropolis-Hasting algorithm or the gibalgorithm has a burn in process. The so-called burn in process is because both algorithms are Markov Chain Algorithms, A certain step is required to reach the stable state. Therefore, a burn in process is required. Only the samples obtained when the equilibrium state is reached can be the target distribution samples in the equilibrium state. Therefore, samples generated during Burn in must be discarded. There is no mature method to determine whether a process has reached the balance state. Currently, the common method is to check whether the status is stable (for example, drawing a graph, if there is not much change in a long process, it is likely that it has been balanced.) Of course, this method cannot be sure whether a state is balanced. You can give a counterexample, but there is actually no way.

It can be proved that the Gini algorithm is a special case of the metropolis-Hasting algorithm, that is, a special column of ratio \ alpha (x, y) = 1. Here is the detailed proof.

(PKU, sewm, shinning)

[Transfer] http://blog.sina.com.cn/s/blog_8a7a0d5501015bxe.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Basic Idea of Random Simulation and common sampling methods (sampling)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Basic Idea of Random Simulation and common sampling methods (sampling)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support