Mysterious mathematical inference for solving Markov chain Monte Carlo

Source: Internet
Author: User
Tags what parameter


First, what is the Markov chain Monte Carlo (MCMC) method?


The shortest answer is:


"MCMC is a method to approximate the posterior distribution of interested parameters by random sampling in probabilistic space."


In this article, I can explain this short answer without any mathematical knowledge.



Basic terminology of Bayesian theory system


First, there are some terms.


The parameters of interest are just some numbers that are used to abstract the phenomena we are interested in. Usually we use statistical methods to estimate these parameters. For example, if we want to know the height of an adult, the parameters we need may be the average height in inches.


The distribution is a mathematical representation of each possible value of the parameter and the probability that we can observe each parameter.


The best example is the bell-shaped curve:




In Bayesian statistical mode, there is another explanation for distribution. Bayes not only represents the value of the parameter and the true value of each parameter, but rather the distribution describes the true reliability of the parameters. Therefore, the bell curve above shows that we are very certain that the value of the parameter is close to 0, and that the probability that the real value is above or below that value is equal.


In fact, the height of a person follows a normal distribution, so we assume that the true value of the average human height follows the bell-shaped curve:





Obviously, this chart shows that the crowd has lived for many years as a giant, because it is known that the most likely average adult height is 6 ' 2 ' inches.


Let's imagine someone was going to collect some data and then they observed a group of 5 inches and 6 inches between the people. We can use another normal distribution curve to represent this data, which shows which body's average height value best explains the data:





In Bayesian statistics, the distribution of the degree of certainty of a parameter is called a priori distribution, because it captures our knowledge before any data is seen.


The likelihood distribution sums up what the data can tell us in terms of the range of parameter values, and each parameter in the parameter value explains the probability of the data we are looking at. The parameter value of estimating the maximum likelihood distribution is the answer to the question: What parameter values make the distribution most likely to observe the data we observe. In the absence of prior information, we may be able to stop there.


However, the key of Bayesian analysis is to combine the prior information and likelihood distribution to determine the posterior distribution. This tells us, in the case of prior data, which parameter values can maximize the probability of seeing the data we specify. In the above example, the posterior distribution should be this:





In the diagram above, the red line represents the posterior distribution. You can think of it as an average of a priori and a probability distribution. Because the prior distribution is shorter and more dispersed, it represents a group of probabilities about the true value of average human body height "uncertain". At the same time, the probability distribution in the relatively narrow range can be summed up the data, so it represents the real parameter value of "more certainty".


When a priori and a possibility are combined, the data (probability distribution representation) weakens the likelihood that an individual grows up in a giant. Although the person still thinks that the average height of the person is slightly higher than the data tells him, he believes the data most.


In the case of two-bell-shaped curves, it is very easy to solve the posterior distribution. There is a simple equation to combine the two. But if our prior distribution and probability distribution is not so good.


Sometimes it is most accurate to model our data or our prior information by using a distribution that is not a regular shape. If our probability distributions are better represented by two peaks, and for some reason we want to explain some very odd apriori distributions, what do we do? I have drawn an ugly transcendental distribution by hand:




Visualization in Matplotlib, enhanced using MS Paint


As previously mentioned, there are some posterior distributions that can give the possibility of each parameter value. However, it is difficult to determine the exact shape of the distribution curve and can not be solved by analysis.


So enter the MCMC method.



Mcmc method


The MCMC method allows us to estimate the shape of the posterior distribution in case we cannot calculate it directly. In fact, MCMC is the Markov chain Monte Carlo method. To understand how they work, I will first introduce the Monte Carlo estimate, then discuss the Markov chain.


Monte Carlo Estimation


Monte Carlo estimation is a method of estimating fixed parameters by repeatedly generating random numbers. Monte Carlo estimation can provide an approximate value for a parameter when it is sometimes unrealistic to calculate this parameter directly by generating random numbers and calculating them.


Suppose we want to estimate the area of the following circle:





Because the circle is in a square with a 10-inch side length, it is easy to calculate its area of 78.5 square inches. In another way, we can randomly extract 20 dots in a square. Then we calculate the proportions of the points within the circle and multiply the square by the area. And this number is an approximate value for a very good circle area.





Since 15 of the 20 points are in the circle, it looks like the area of the circle is about 75 square inches. This result is not too bad for Monte Carlo simulations with only 20 random points.


Now, imagine the size of the shape that we want to compute for the Batman curve equation (Batman equation):





This is the shape of an equation that we have never learned. Therefore, it is very difficult to find the area of the bat signal. However, by randomly using the rectangle that contains the bat shape, the Monte Carlo simulation method can easily find the approximate value of the shape area.


Monte Carlo simulations are not only used to estimate the area of complex shapes. By generating a large number of random numbers, they can be used to simulate very complex processes. In practice, it is customary to use this method to predict weather, or to estimate the likelihood of winning an election.


Markov chain


The second element of understanding the MCMC method is the Markov chain. This is the sequence of event-related probabilities. Each event comes from a set of results, and the results of each event determine the result of the next event based on a fixed set of probabilities.


An important property of Markov chains is that they are memory-free: In the current state, you may need all the available events to predict the next event, and you cannot have new information from the old event. Games like Chutes and ladders show this kind of memory-free or Markov attribute.


But in the real world, few events actually work this way. However, Markov chains are a powerful way to understand the world.


In the 19th century, the bell-shaped curve was seen as a common pattern in nature. (For example, we have noticed that the height distribution of a person is a bell-shaped curve). Galton boards The normal curve of the marble distribution by simulating the average of repeated random events by placing marbles in the wooden planks containing the nails:





Peter Pavel Nekrasov, a Russian mathematician and theologian, Paville Nieklasov that the bell-shaped curve and the more general law of large numbers are just the product of children's games and trivial puzzles, as it is assumed that each event is completely independent. and Nieklasov that things in the real world are interdependent, such as human behavior, so things in reality do not conform to good mathematical patterns or distributions.


Andre Markov tried to prove that a non-independent event might also fit this pattern. One of his most famous experimental examples is to calculate thousands of two-character pairs from Russian poetry. Using these character pairs, he calculates the conditional probability of each character. That is, given a preceding letter or space, the next letter may be a A, a t, or a space.


Using these probabilities, Markov can simulate any long sequence of characters. This is a Markov chain.


Although the first few letters depend largely on the selection of the starting character, Markov shows that in the long run, the distribution of characters is a pattern. Therefore, even interdependent events are consistent if they are affected by the fixed probability.


To give a more convincing example, suppose you live in a five-room house with one bedroom, bathroom, living room, dining area and kitchen.


Let's gather some data and assume that the room you're in at any point in time is the next room we think we can get into. For example, if you are in the kitchen, you have a 30% chance to stay in the kitchen, a 30% chance to enter the dining room, a 20% chance to enter the parlour, a 10% chance to go to the bathroom, and a 10% chance to enter the bedroom. Using the probability of each room's entry, we can build a Markov chain that predicts the next room you might go to.


If we want to predict where someone in the house will go after a little while in the kitchen, Markov chains can be used for this kind of prediction. But since our predictions are based solely on a person's observation at home, such predictions are unreliable.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.