Bayesian Network __ Machine learning

Source: Internet
Author: User

Bayesian networks, Markov random field (MRF, Markov Randomfield) and factor graphs all belong to concept maps, so they all belong to the concept map model in machine learning (pgm,probability graphical model).

One: Defining

Bayesian networks, also known as belief networks (belief network, BN), or a direction-free graph model, are composed of a direction-free graph (dag,directed acyclic graphical model) and conditional probability distributions (that is, known as P (xi|parent (xi) The probability of the occurrence, where parent (xi) is the direct parent node of XI. It is an uncertain model for simulating causal relationship in the process of human reasoning, and its network topology is a direction-free graph (DAG).

So given the sample (including features and tags), why do we have to build a Bayesian network?

We give a simple example to illustrate: for example, given a training sample (characterized by Smoking,bronchitis,cancer,x-ray; the label is dyspnoea; all are two-point distributions), how do we obtain the joint probability distribution?

So at this point we derive the 2^5-1=31 probability from the data (that is, the 5-bit binary, if the probability of 01111 is one), then when someone asks you the probability that the data, such as P (s=1,b=1,c=0,x=0,d=1), will occur, you can find it from the 31, which is the probability that it will happen. And for Bayesian networks, we do not need so many parameters to compute their joint probabilities, the following figure is the topological structure of its Bayesian network, which gives the probability of D occurring under the condition of (c,b), at which point we need 4 probability parameters, for the same reason, for B, it takes 2, C needs 2, X needs 4, D takes 4, s needs 1, so it is 1+2+2+4+4=13. As long as the probability distributions of these 13 parameter probabilities are obtained from the sample data, we can calculate their joint probability distributions at this time. Therefore, the Bayesian network greatly simplifies the calculation, and if the Bayesian networks and the corresponding parameters of the conditional probability distribution table are given, we can obtain the probability of any condition.


The joint probability distribution of a normal Bayesian network is:


The process of Bayesian network model is:

o The topological structure of Bayesian networks and the conditional probability distribution parameters of nodes are established by the given sample data. This often needs to be accomplished by means of prior knowledge and maximum likelihood estimation.

o The network can be used to calculate the conditional probability or the posterior probability for the unknown, so as to achieve the goal of diagnosis, prediction or classification, if the node topology and the conditional probability distribution are determined by Bayesian network.

Note: The nodes in the network topology include the features and tags of the training samples. The following example Bayesian network topology, where the account is true is the label, and the other three are characteristics.


Two: The construction of Bayesian networks

Determine the topological relationship between random variables to form DAG. This step usually requires domain experts to complete, and in order to build a good topology, often need to continue to iterate and improve.

The algorithm for constructing Dag is given below:

Algorithm process:

(1) Select a reasonable order of variables: x1,x2, ... Xn

(2) for I=1 to N: Add node XI in the network, in x1,x2 ... Xi-1 chooses Xi's parents to make


(3) This construction method clearly guarantees the global semantic requirements:


Note: The final structure of the result is related to the initialization order of variables, not necessarily the most beautiful Bayesian network, at this point the conditional probability table requires more parameters.

Three: The condition is determined by Bayesian network Independent

Three conditions can be obtained by Bayesian Networks: (1) Form 1:head-to-head

The first structural form of the Bayesian network is shown in the following illustration:


So there are: P (a,b,c) = P (a) *p (b) *p (C|A,B) is established, after simplification can be:


That is, in the case of C unknown, a, B is blocked (blocked), is independent , called head-to-head condition Independent, corresponding to the beginning of this section of the diagram "x1, x2 Independent."

(2) Form 2:tail-to-tail

The second structural form of the Bayesian network is shown in the following figure


Consider c unknown, with C known in both cases: when C is unknown, there are: P (a,b,c) =p (c) *p (a|c) *p (B|c), at which point P (a,b) = P (a) p (b) is not available, that is, when C is unknown, A, B is not independent. When C is known, there are: P (a,b|c) =p (a,b,c)/P (c), the P (a,b,c) =p (c) *p (a|c) *p (b|c) is then brought into the equation by: P (a,b|c) =p (a,b,c) (c) = P (c) *p (a|c) *p (b |C)/P (c) = P (a|c) *p (b|c), when C is known, A, B is independent.

Therefore, under the given condition of C, the A,b is blocked (blocked), which is independent , called Tail-to-tail condition Independent, corresponding to the "X6 and X7 under X4 given conditions in the first diagram of this section."


(3) Form 3:head-to-tail

The third structural form of the Bayesian network is shown in the following illustration:


or the C unknown and C known in both cases: C when unknown, there are: P (a,b,c) =p (a) *p (c|a) *p (b|c), but can not be introduced P (a,b) = P (a) p (b), that is, c unknown, A, B is not independent. When C is known, there are: P (a,b|c) =p (a,b,c)/P (c), and can be reduced according to P (a,c) = P (a) *p (c|a) = P (c) *p (A|C):


Therefore, under the given condition of C, A,b is blocked (blocked), is independent , called Head-to-tail condition Independent.

In a word : This head-to-tail is actually a chain network, as shown in the following figure:

Based on previous explanations of head-to-tail, we already know that the xi+1 distribution and x1,x2...xi-1 conditions are independent under XI given conditions. What it means. means: The distribution state of the xi+1 is only related to Xi, and other variable conditions are independent. In layman's parlance, the current state is related only to the previous state, regardless of the previous state. This stochastic process of sequential evolution is called the Markov chain (Markovchain). And there are:

Four: A general method for inferring probabilities of Bayesian networks

Bayesian networks can be used to calculate the conditional probability or the posterior probability of the unknown, so as to achieve the purpose of diagnosis, prediction or classification. In general, we can convert the final probability into the edge probability to solve the problem. Here we give a general method of finding the edge probability through Bayesian networks.

o Factor graph (Factor graph) from Bayesian networks

o calculation probability (sum_product algorithm) by the idea of message passing in the factor graph

So here's how to explain the factor graph and the sum_product algorithm.

Factor diagram:

The factor map given on Wikipedia is defined as follows:


Here's a Li Zilai note:

But we can actually use a variant of the factor graph called Forney-stylefactor graph. The method of constructing factor graphs from Bayesian networks: a factor in a Bayesian network a node in the Bayesian network each variable on the factor graph corresponding to the edge or half node G and Edge X and only if the variable x appears in the factor G.

Give a chestnut, as follows:

sum_product Algorithm

After the factor graph is obtained, we can obtain the edge probability distribution by using the idea of sum_product algorithm (also called beliefpropagation) message passing.

In fact, the edge probability of a random variable FK can be obtained by x1,x2,x3, ..., the joint probability of XN, the specific formula is:

And usually we directly through the Bayesian network topology and conditional probability table can be calculated, but will definitely confuse you, then there is no easy way, here through the Li Zilai to the idea of message delivery.

For a chestnut, suppose we now need to compute the result of the following equation:

At the same time, the factor diagram is:

Using the allocation rate, we can extract the common factor:

Because the edge probability of a variable equals the product of all the messages that are passed over by the function connected to it, the calculation is:

By looking closely at the above calculation process, you can find that you have used a message-like approach with a total of two steps.

First, for the decomposition of F, the message is passed outside the two boxes surrounded by a blue dotted box and a red dotted frame:

The calculation may be:

Second, message delivery within the two box enclosed by a blue dotted box and a red dotted frame:

finally: The general framework of the SUM_PRODUCT algorithm is given below:

V: Bayesian network to solve the problem of the ring

It can be found that if there is a "ring" (no direction) in the Bayesian network, then the constructed factor graph will get the loop. With the idea of message passing, this message will be transmitted indefinitely, which is not conducive to probability calculation.

There are three solutions:

1: Delete a number of edges in the Bayesian network, so that it does not contain a non-direction loop, using the idea of the maximum weight Generation tree algorithm (MSWT). This is not a lot to say, you are interested to see the end of the literature reference.

2: Reconstruct the Bayesian network without rings-by adjusting the initialization order to get a better Bayesian network.

3: Select the loopy belief propagation algorithm (you can easily understand the sum-product algorithm recursive version), this algorithm generally select a message in the ring, randomly assign an initial value, and then use the sum-product algorithm, iterative, because there are loops, Be sure to arrive at the message that you just assigned the initial value, and then update the message and continue iterating until no messages change. The only drawback is that there is no guarantee of convergence, of course, this algorithm is convergent in most cases.

Reference documents:

1:http://blog.csdn.net/v_july_v/article/details/40984699 from Bayesian approach to Bayesian network 2:http://blog.jobbole.com/86441/?from= Timeline&isappinstalled=0 Algorithm grocery store: Bayesian network of classification algorithms (Bayesian Networks)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.