From: http://blog.csdn.net/xianlingmao/article/details/7768833
Introduction
We will encounter many problems and cannot use the analytical method to obtain the exact solution. For example, the formula is special and cannot be solved. In this case, you need to find a method to obtain its approximate solution, and have a means to measure the approximate degree of the solution (such as gradual progress, upper and lower limits, or something)
Basic Idea of Random Simulation
Now let's assume that we have a region R (the size is known) of a rectangle, and there is an irregular area M in this region (that is, it cannot be calculated directly using the formula). Now we need to take the area of M? How can this problem be solved? There are many similar methods, such as dividing the irregular area M into many small rule areas and using the area sum of these rule areas to approximate M, another approximate method is the sampling method. We grab a handful of soy beans and place them evenly in the rectangular area. If we know the total number of soy beans s, as long as the number of soy beans in the irregular area M is S1, we can find the m area: M = S1 * R/S.
In the field of machine learning or statistical computing, we often encounter the following problems: how to obtain a fixed point: \ INF _ A ^ B f (x) dx, such as normalization factor.
How can we solve such problems? Of course, if the given points can be parsed and obtained directly, if not, you can only obtain the approximate solution. The common approximate method is to use Monte Carlo points, that is, rewrite the formula:
\ INF _ A ^ B f (x) * g (x)/g (x) dx = \ INF _ A ^ B (1/g (x )) * f (x) * g (x) dx
Then, f (x)/g (x) is used as a function, while G (x) is treated as a probability distribution on [a, B]. After N samples are extracted, the formula above can be further written as: {\ sum _ 1 ^ N [F (x_ I)/g (x_ I)]}/n. When n tends to be infinite, according to the theorem of big numbers, the substatement is equal to the required definite integral substatement. Therefore, an approximate solution can be obtained using the sampling method.
Through the above two examples, we can probably understand the basic idea of the sampling method to solve the problem,The basic idea is to transform the problem to be solved into a problem that can be solved through a certain sampling method. As for how to convert it, it is quite creative and there is no way to determine it.Therefore, the core of the random simulation method is how to obtain a sample (sampling) for a probability distribution ).
Common Sampling Methods
Direct Sampling Method
Because it is relatively simple and can only solve simple problems, it is generally a one-dimensional distribution problem.
Accept-reject sampling (acceptance-rejection sampling)
In order to get a distributed sample, we get a lot of preliminary samples through some mechanism, and then some of the preliminary samples will be used as valid samples (that is, the distribution samples to be extracted ), some preliminary samples will be considered invalid samples to be discarded. The basic idea of this algorithm is: we need to sample a distributed f (x), but it is difficult to directly sample it, therefore, we want to use another easily sampled g (x) distribution sample to remove some samples by some mechanism, so that the remaining samples are derived from and requested distribution f (x).
It has several conditions: 1) for any X, f (x) <= m * g (x); 2) g (x) is easy to sample; 3) g (x) is better to be close to f (x) in shape ). The specific sampling process is as follows:
1. Sample g (x) to obtain a sample Xi, Xi ~ G (x );
2. For the uniform distribution sampling UI ~ U (A, B );
3. if the UI <= f (x)/[M * g (x)], Xi is considered as a valid sample; otherwise, this sample is discarded; (# This step fully reflects the method name: Accept-reject)
4. Repeat steps 1 ~ 3, until the required samples meet the requirements.
This method can be:
(Note: This figure is obtained from other places. It is not drawn by yourself. Some symbols are inconsistent with those in the text. \ Pi (x) is the F (x) in the text ), Q (x) is the G (x) in the text ))
Importance Sampling)
Importance Sampling is closely related to Monte Carlo points. Let's look at the points:
\ INF f (x) dx = \ INF f (x) * (1/g (x) * g (x) dx. If G (x) is a probability distribution, N samples are extracted from G (x). The formula is approximately equal to (1/n) * \ sum F (xi) * (1/g (XI )). This is equivalent to assigning a weight to each sample. A large value of G (xi) indicates a high probability. Therefore, N contains many samples with Xi, that is, these samples have a high weight, so it is called importance sampling.
The sampling procedure is as follows:
1. Select an easy-to-sample distribution g (x) and extract n samples from G (x;
2. Calculate (1/n) * \ sum F (xi) * (1/g (XI) to obtain the approximate solution.
(PKU, sewm, shinning)
MCMC Sampling Method
Both sampling rejection and sampling importance belong to independent sampling, that is, the sample is independent from the sample. Such sampling efficiency is relatively low, for example, sampling rejection, most of the extracted samples are invalid, so the efficiency is relatively low. The MCMC method is associated with sampling, that is, the next sample is related to this sample, so that the sampling efficiency is high. The basic idea of the MCMC method is to buildMarkov ChainMakeStable DistributionIs the distribution f (x) We want to sample ). If this Markov chain reaches a stable state, each sample from this chain is a sample of f (x), so as to achieve the purpose of sampling.
Metropolis-Hasting Algorithm
Assume that the probability distribution to be sampled is \ Pi (x), and now assume that there is a probability distribution P (Y | X), so that \ Pi (x) * P (Y | X) = \ Pi (y) * p (x | Y) is trueMeticulous balancing formulaThis careful balance formula is a necessary condition for a Markov chain to achieve stable distribution. Therefore, the key is to construct a probability distribution P (Y | x) so that it can meet the meticulous balance. Now suppose we have an easily sampled distribution Q (Y | X) (called suggested distribution). For the current sample X, it can pass Q (Y | X) obtain the next suggestion sample Y. The suggestion sample y is accepted or not accepted according to a certain probability, which is called ratio \ alpha (x, y) = min {1, Q (x | y) * \ Pi (y)/[q (Y | x) * \ Pi (x)]}. That is, if you know sample Xi, how do you know what the next sample X _ {I + 1} is? Q (Y | xi) is used to obtain a recommended sample y, and then according to \ alpha (XI, Y) determines whether X _ {I + 1} = Y or X _ {I + 1} = xi. It can be proved that the distribution Q (Y | x) * \ alpha (x, y) meets the meticulous balance, and the samples obtained from this extraction are samples with the distribution \ Pi (X. The procedure is as follows:
1. Given a starting sample x_0 and a suggested distribution Q (Y | X );
2. for the I-th sample Xi, Use Q (Y | xi) to obtain a suggested sample y; calculation ratio \ alpha (XI, y) = min {1, Q (XI | Y) * \ Pi (y)/[q (Y | xi) * \ Pi (xi)]};
3. Extract A uniform distribution sample UI ~ U (). If UI <= \ alpha (XI, Y), X _ {I + 1} = y; otherwise, X _ {I + 1} = XI;
4. Repeat Step 2 ~ 3. Until the desired number of samples is extracted.
If the distribution Q (Y | X) is recommended to meet the following requirements: Q (Y | X) = Q (x | Y), that is, symmetric. At this time, the ratio \ alpha (x, y) = min {1, \ Pi (y)/\ Pi (x)} is the original algorithm in 1953. Later, hasting extended the algorithm, and distributed symmetry is not required, the above algorithm is obtained. However, this algorithm has a disadvantage: the sampling efficiency is not high, and some samples will be discarded. Thus, the Gaussian algorithm is generated.
Gaussian Sampling Algorithm
It's very simple, it'sSampling with conditional distribution instead of sampling with full Probability Distribution. For example, if X = {x1, x2,... xn} satisfies the distribution of p (x), how can we sample p (x? If we know its conditional distribution P (X1 | X _ {-1 }),..., P (XI | X _ {-I }),...., X _ {-I} indicates all the variables of X except Xi. If these conditional distributions are easily sampled, We can sample the total probability distribution p (x) by sampling the conditional distributions.
The following describes how to use the Gaussian Sampling Algorithm:
1. Specify an initial sample X0 = {x10, x20,..., xn0}
2. A sample xi = {x1i, x2i,..., xni} is known, and X1 _ {I + 1} is sampled. X1 _ {I + 1 }~ P (X1 | Xi _ {-1 })
3. Sample X2 _ {I + 1}, and X2 _ {I + 1 }~ P (X2 | X1 _ {I + 1}, x3i,... xni)
........................................ ........................
4. Sample XN _ {I + 1}, XN _ {I + 1 }~ P (XN | X1 _ {I + 1}, X2 _ {I + 1},... X _ {n-1} _ {I + 1 })
5. Step 2 ~ 4. You can obtain a sample of x and repeat Step 2 ~ 4. The samples of X can be continuously obtained.
Of course, either the Metropolis-Hasting algorithm or the gibalgorithm has a burn in process. The so-called burn in process is because both algorithms are Markov Chain Algorithms, A certain step is required to reach the stable state. Therefore, a burn in process is required. Only the samples obtained when the equilibrium state is reached can be the target distribution samples in the equilibrium state. Therefore, samples generated during Burn in must be discarded. There is no mature method to determine whether a process has reached the balance state. Currently, the common method is to check whether the status is stable (for example, drawing a graph, if there is not much change in a long process, it is likely that it has been balanced.) Of course, this method cannot be sure whether a state is balanced. You can give a counterexample, but there is actually no way.
Basic Idea of Random Simulation and common sampling methods (sampling)