Bayesian Network and Bayesian Network Model

Last Update:2015-05-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Bayesian Networks, Markov Random Fields (MRF, Markov RandomField), and factor graphs all belong to the PGM and Probability Graphical models in machine learning ).

I. Definition

Bayesian Network, also known as the Belief Network (BN), or Directed acyclic graph model, is a Directed acyclic graph (DAG, Directed acyclic graphical model) and conditional probability distribution (that is, knowing the probability that P (xi | parent (xi) occurs, where parent (xi) is the direct parent node pointing to xi ). It is an uncertainty processing model that simulates the causal relationship in the process of human reasoning. Its network topology structure is a directed acyclic graph (DAG ).

So why do we need to establish a Bayesian network for a given sample (including features and tags?

Let's take a simple example to illustrate: for example, given a training sample (the feature is Smoking, Bronchitis, Cancer, X-ray; the label is Dyspnoea; both are two-point distributions ), how can we obtain the joint probability distribution?

Then we can get a 2 ^ 5-1 = 31 probability from the data (that is, a 5-bit binary, such as a probability of 01111 occurrence), so that when someone asks you, this data is like the probability of occurrence of p (s = 1, B = 1, c = 0, x = 0, d = 1), you can find it from the 31, is the probability of its occurrence. For Bayesian Networks, we do not need so many parameters to calculate their joint probability. For example, for the topological structure of Bayesian Networks, the conditions (C, B) are given, the probability of occurrence of D. For D, we need four probability parameters. Likewise, for B, we need two, C needs two, X needs four, and D needs four, S requires 1, so it is 1 + 2 + 2 + 4 + 4 = 13. As long as the probability distribution table of these 13 parameters is obtained from the sample data, we can calculate their joint probability distribution. Therefore, the Bayesian network greatly simplifies the calculation. If the Bayesian Network and the conditional probability distribution table of corresponding parameters are given, the probability under any condition can be obtained.

The joint probability distribution of a normal Bayesian network is:

The process of Bayesian network model is as follows:

O establish the topological structure of the Bayesian Network and the conditional probability distribution parameters of the node based on the given sample data. This usually requires the use of prior knowledge and maximum likelihood estimation.

O you can use this network to calculate the conditional probability or posterior probability of unknown data on the premise that the node topology and conditional probability distribution are determined by the Bayesian Network, so as to achieve the purpose of diagnosis, prediction or classification.

Note: the nodes in the network topology contain the features and labels of the training sample. For example, the topological structure of the Bayesian network is as follows. Whether the account is actually a tag, and the other three are all features.

Ii. Construction of Bayesian Networks

Determine the topological relationship between random variables to form a DAG. This step usually needs to be done by experts in the field. To build a good topology, it usually requires continuous iteration and improvement.

The following is an algorithm used to construct a DAG:

Algorithm process:

(1) A reasonable sequence of Variable Selection: X1, X2 ,... Xn

(2) for I = 1 to n: Add node Xi in the network, in X1, X2... Xi-1 selects Xi's parent, making

(3) This constructor obviously guarantees the Global Semantic requirements:

Note: The final result is related to the initialization sequence of the variable. The result is not necessarily the most beautiful Bayesian network. In this case, more parameters are required for the conditional probability table.

Iii. conditional independence determined by Bayesian Networks

Bayesian networks can be used to obtain three independent conditions:

(1) Form 1: head-to-head

Shows the first structure of Bayesian Networks:

Therefore, P (a, B, c) = P (a) * P (B) * P (c | a, B) is valid. After simplification, you can obtain:

That isUnder the unknown condition of c, a and B are blocked (blocked) and are independent.The head-to-head condition is independent, which corresponds to "x1 and x2 independence" in the starting figure in this section ".

(2) Form 2: tail-to-tail

Shows the second structure of Bayesian Networks.

Consider the two situations where c is unknown and c is known:

When c is unknown, P (a, B, c) = P (c) * P (a | c) * P (B | c, P (a, B) = P (a) P (B) cannot be obtained, that is, when c is unknown, a and B are not independent.
When c is known, there are: P (a, B | c) = P (a, B, c)/P (c), and then P (a, B, c) = P (c) * P (a | c) * P (B | c) is input into the formula. P (a, B | c) = P (, b, c)/P (c) = P (c) * P (a | c) * P (B | c)/P (c) = P (a | c) * P (B | c), that is, when c is known, a and B are independent.

So,InUnder the conditions specified by c, a and B are blocked (blocked) and are independent.It is called the "tail-to-tail" condition independent. It corresponds to "x6 and x7 are independent under the condition specified by x4" in the starting figure in this section ".

(3) Form 3: head-to-tail

Shows the third structure of Bayesian Networks:

It is also divided into two situations: Unknown c and known c:

When c is unknown, P (a, B, c) = P (a) * P (c | a) * P (B | c ), however, P (a, B) = P (a) P (B) cannot be introduced, that is, when c is unknown, a and B are not independent.
When c is known, P (a, B | c) = P (a, B, c)/P (c), and according to P (a, c) = P (a) * P (c | a) = P (c) * P (a | c), which can be reduced:

So,InUnder the conditions specified by c, a and B are blocked (blocked) and are independent.The head-to-tail condition is independent.

Insert a sentence: This head-to-tail is actually a chain network, as shown in:

Based on the previous head-to-tail explanation, we know that under the conditions specified by xi, the distribution of xi + 1 and x1, x2... XI-1 conditions are independent. What does it mean? This means that the distribution of xi + 1 is only related to xi and independent from other variable conditions. In layman's terms, the current status is only related to the previous status, but not to the previous or previous status. This random process of sequential evolution is called Markovchain ). And there are:

Iv. General Method of Bayesian network inference Probability

Bayesian networks can be used to calculate the conditional probability or posterior probability of unknown data to achieve diagnosis, prediction, or classification. Generally, we can convert the final probability to the edge probability for solving. Here we provide a general method to calculate the edge probability through Bayesian Networks.

O obtain the Factor Graph from Bayesian Networks)

O calculate the probability (sum_product algorithm) through the message passing idea in the factor graph)

So the following describes the factor diagram and the sum_product algorithm.

Factor chart:

The definition of a factor chart in Wikipedia is as follows:

Here is a description of the chestnuts:

However, we can actually use a factor graph deformation called Forney-stylefactor graph. The method for constructing a factor chart by Bayesian Networks is as follows:

A node in the factor map of a Bayesian Network
Each variable in Bayesian networks corresponds to the edge or half edge on the factor graph.
Node g is connected to edge x only when the variable x appears in factor g.

Give a chestnut as follows:

Sum_product Algorithm

After obtaining the factor diagram, we can use the Sum_product algorithm (also called beliefpropagation) to calculate the edge probability distribution.

In fact, the edge probability of a random variable fk can be obtained from the joint probability of x1, x2, x3,..., xn. The specific formula is:

Normally, we can calculate it directly through the topological structure and conditional probability table in the Bayesian network, but it will definitely confuse you. Is there a simple method, here we use chestnuts to get the idea of message transmission.

For example, assume that we need to calculate the result of the following formula:

Meanwhile, the factor chart is:

Using the allocation rate for reference, we can extract the common factor:

Because the edge probability of the variable is equal to the product of the message transmitted by all functions connected to it, the calculation is as follows:

After carefully observing the above calculation process, we can find that the idea similar to "message transmission" is used, and there are two steps in total.

Step 1: For the f decomposition graph, transmit messages based on the messages outside the two boxes enclosed by the blue dotted box and the red dotted box:

Computing:

Step 2: transmit messages in two boxes enclosed by blue and red dashed boxes:

Finally:The following describes the overall framework of the sum_product algorithm:

V. Bayesian Networks solve problems with loops

It can be found that if the Bayesian network contains a "ring" (undirected), The constructed factor graph will get the ring. However, with the idea of message transmission, this message will be transmitted infinitely, which is not conducive to probability calculation.

There are three solutions:

1. delete several edges in the Bayes network so that they do not contain directed loops. The maximum weight Spanning Tree Algorithm (MSWT) is used ). For more information, see references at the end of this article.

2: Re-construct a Bayesian network without loops-get a better Bayesian network by adjusting the initialization sequence.

3: select the loopy belief propagation algorithm (you can simply understand it as the recursive version of the sum-product algorithm). This algorithm generally selects a message in the ring and randomly assigns an initial value, then, use the sum-product algorithm to iterate over the message. Because of the loop, the message will arrive at the initial value just given, update the message, and continue iteration until no message is changed. The only drawback is that convergence is not guaranteed. Of course, this algorithm converges in most cases.

References:

1: bytes

2: http://blog.jobbole.com/86441? From = timeline & isappinstalled = 0 algorithm grocery store: Bayesian Network of classification algorithms (Bayesian networks)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Bayesian Network and Bayesian Network Model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Bayesian Network and Bayesian Network Model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support