Before trying to learn Coursera on Markov with the airport courses, found that do not understand, the reason is that the guy came up with a brief introduction of Markov with the airport is what it began to graphcut. But what exactly is Markov with the airport. Why Markov can be defined and deduced in the same way as the airport. None of this was clear, so he listened to a muddle. This system has learned the probability diagram, has the clearer understanding to this question, here briefly discusses own understanding.
If there is no additional explanation, the following variables are discrete variables of two value 0-1 for the purposes of the narration.
I. Probability map
The so-called probability graph, is the probability relation uses the graph to express. The so-called probability relationship, in fact, refers to the independence. What do you mean by that? For example, I have two variables X1 and X2, and the two variables are independent, with the graph described as two single points without any connections. And if the two points are related (not independent), then there is an edge in the middle. Why are you doing this? Because the diagram to describe the very clear, independent at a glance, and very concise. For example, to describe the two independent variables, we want to give the X1,X2 joint probability distribution, that is, p (x1=i,x2=j), if it is two value, then need 4 formulas to describe (because the addition is equal to 1, is actually 3 parameters). But using the probability graph, two single points are drawn to indicate that the two points are independent, then only need to be given P (x1=i) and P (x2=j) on it, and 2 parameters are required.
You may find the examples too abstract or do not understand why, so give a simple example. For example, I would like to infer the probability of a sunny day in Beijing and a clear London weather today. First of all we can infer from common sense that these two events should be independent, then our probability map can be drawn into two single points. So we just have to specify the probability that the weather in Beijing is clear and London is clear, without having to consider their joint distribution. You may think that this example does not explain the problem, because even without the probability map, I also know that you just need to multiply the individual probabilities. This is because the example is too simple (to illustrate the core idea), the actual problem often involves many nodes and many values of variables, at that time the advantage of the probability diagram is very obvious. Imagine that we have a A-Z for a total of 24 variables, which have a lot of independent relationships, you can guarantee the use of the method of non-probabilistic graphs do not mess up. But if the picture is drawn, the relationship is clear.
There is a very important problem in the probability graph, that is the graph and the original problem that the probability graph describes is not equivalent. Solve this problem before we can be confident to use it boldly. Through people's research, we find out the relationship between the probability graph and the probability relation that it can describe, this will be detailed in the next section for Markov with the airport.
In addition to the probability relationship (independence), we also need to put the probability up, is similar to P (x1=1,x2=1,x3=1) =0.1 such a probability into the diagram, so that we can do with the probability map to do (in the previous example is actually this step, but not deliberately emphasized). You might say, if you add this probability description, it's no different from the original. Because the original is a bunch of such a probability value. It's worth noting that the probabilities we're going to add are different. Since we didn't determine the independence relationship beforehand, so this probability needs to include all the variables, such as three variables, we need to list P (x1,x2,x3) in a total of 8 cases, we need to retain 7 separate parameters to describe it. This assumes that we know that the x1,x2,x3 are independent of each other, so we can first use the graph to express the probability relationship (that is, they are not directly connected to the three), and then use the simplified probability to describe, namely, three probability P (X1), P (X2), P (X3) to describe, Only 3 separate parameters are required.
As we can see from the above example, the modeling of probability graphs is divided into two steps. First, the probabilistic relationship is described by the structure of probability graphs. Second, add the probability value to complete the description.
Second, Markov Random Airport
(From ancient Jin teacher probability chart course ppt)
Why did you say a bunch of probability graphs? Because Markov random field is a kind of probability graph, called the direction-free graph, the so-called non-directed graph is a graph which is made up of line connecting nodes without orientation. Note that the purpose of our probability diagram is to describe the probability relation, in which the two nodes wired together in the Markov random field are not independent. But the nodes without wires are bound to be independent. The answer is in the negative. As long as there is a path between the two nodes, the two nodes are independent (indirectly related). Think carefully, if the node has access to the independent words, it is not Markov with all the nodes in the airport is not independent of the. So, for example, if I asked a node to show a 0 probability, then I would need to determine the impact of all the other variables on it, which is simply too simple. So there is also a property in Markov field, that is, if the direct neighboring node of a point is determined, then the probability of this point and all nonadjacent nodes are independent, which is called Markov. In other words, if I had known X12 and X21, then X11 's values were only relevant to them, regardless of the rest of the nodes.
Here we have completed the first step: to describe the independent relationship, which is actually to determine the skeleton structure of the probability map. In the second step we need to add the probability value.
The so-called probability value, described in the most general way is the complete joint probability distribution, i.e. p (x11=i,....., x34=p) = c. Such a probability. But at this time we have defined the independent relationship through the skeleton description, there are some unique properties inside. Here is the most important place for Markov to follow the airport.
Look at the random field above, suppose we want to determine the probability of x23=1, P (x23=1), according to Markov, as long as we determine the x13,x22,x24 and X33, then the probability is only determined. In other words, we need to determine the joint probability distribution of P (x23,x13,x22,x24,x33). The formula is still too large, so we think it can be broken into a more concise formula. Here, according to some theories (which will be mentioned later), we just need to consider two relationships, called unary potential (unary variable) and binary potential (two-variable). This is also said:
The pi means multiplication, which is to multiply the unary variable and the two variable. Here Z is a normalized factor that adds up all the probabilities (because the sum of the probabilities equals 1). This is equivalent to generalization of probability, because P (Xi) above, p (XI,XJ) is not a probability (to multiply them and then normalized). So the question is, why only the two variables are required to describe the joint probability. Why we don't need to think about P (XI,XJ,XK) such a relationship. Here is the knowledge of the so-called graph theory, the use of proof of reference (in general, the author of the book is not too know how to testify).
The random field above is actually a graph, including nodes and edges. The theory is that when we compute the joint probability distribution, we just need to consider all the relationships of the fully connected child graph. What is called a fully connected child graph? Refers to the set of nodes that are completely connected. For example, a node is a fully contiguous child graph, two nodes only need to connect them is a complete adjacency of the child map, and three nodes need 22 connected, then need 3 edges, four nodes will need 6 edge. In other words, we only need to assign a relationship p (Xi,..., Xj) to all the fully contiguous graphs.
Let's take a look at the random field above. First of all, all the single points are a completely adjacency graph, so we have P (Xi), and all of the two points also form a completely adjacency graph, so there is P (XI,XJ). However, any three nodes are not fully connected, so the two-dollar variable is stopped. This also has the above decomposition. The so-called decomposition theorem tells us that this kind of decomposition is no problem. In addition, since z is a possible addition of all the joint probabilities, it has the same value for all Xi, so it is not necessary to consider the P (xi,..., Xj). After removing z, the probability of the above is still not practical, because it is even multiply. Here is a more common method of processing, is to take negative logarithm.
where n represents the adjacent. So far, we have finally come to the form of the most commonly used energy function in MRF. Here are two posts to discuss MRF's solution (Graph Cuts) and the relationship between this Bovenri knowledge and deep learning.