Probabilistic graphic Model (PGM) learning notes (iv)-Bayesian networks-Bernoulli Bayesian-Bayesian polynomial

Last Update:2015-10-13 Source: Internet

Author: User

Tags imap

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before forgetting to emphasize the important difference: the difference between the conditional probability of the chain rule and the chain law of Bayesian networks

Conditional probability chain law

P\left ({d,i,g,s,l} \right) = P\left (D \right) p\left ({i\left| D \right.} \right) P\left ({g\left| {d,i} \right.} \right) P\left ({s\left| {d,i,g} \right.} \right) P\left ({l\left| {d,i,g,s} \right.} \right) ">

Bayesian network chain rule , 1

Figure 1

At first glance, it is very easy to think that the Bayesian network chain law is not what we have learned the chain law, in fact, otherwise , detailed later.

The previous talk about the factorization of probability distributions

The ability to see the independence of conditional probabilities can be seen directly from the probability distribution expression .

We have used the probability graph model to represent the probabilistic relationship with the graphical G, the independence can be seen directly from the diagram?

Of course. The previous talk has explained the flow relationship of probabilities in probability graphs.

When G is known, the probability abilities between S and D are mutually affected.

The following defines a concept that relies on isolation.

Dependency Isolation (d-separation)

In the case where z is known, there is no path between x and Y.

is called X and y dependent isolation. Recorded as

Introduce a theorem:"The graph does not pass on the independent theorem" (Of course, for good understanding)

The theorem is that. If the probability plot satisfies the dependent isolation

The x and Y conditions are independent

To prove that, today is the Bayesian network chain-law . 2

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

Figure 2

The trick used to divide the sum. Note here that the starting sum of the foot mark is G, I, L

Now divided into 3 parts L and G of the sum of course equals 1, but part I is not, the sum of the part is S, and the sum of the foot is I, so that can not continue to merge .

It's just that we recall the independent equivalence condition before the last one is saying:

P\left ({x, y, z} \right) \propto {\phi _1}\left ({x,z} \right) \phi \left ({y,z} \right) >

So it's done. found that D and S are still independent. This proves that "the graph does not pass the independent theorem".

So can't help asking, what situation does not make sense?

The first conclusion: when the parent node is known, the node is not connected to a node other than the descendant node.

It's called "the principle of non-rule."

That's a good word, just look at the picture, 3.

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

Figure 3

We use letter nodes as examples. His parent node is grade, his descendants are job and happy, so he and the rest of the SAT, Intelligence, difficulty, coherence do not pass.

Rough analysis, the ring above the walk is due to the grade known, the following is because the job does not know. The principle of analysis has been described in detail.

Define an IMAP

Since the diagram does not work independently, assuming that this does not pass the graph G corresponding probability distribution is p, we will call G is P i-map (Independencymap).

Suppose that the independent probability distribution P can be decomposed according to a graph G. So G is the IMAP of P.

In turn, assuming that G is an IMAP of probability distribution p, then p can be decomposed according to G.

So there are 2 equivalence points in the probability map.

1. Probability graph G is used to represent the probability distribution p.

The 2.P is used to express the independent relationship shown by the probability graph G.

Proving the probability plot and probability distribution is one thing.

First write the conditions in Figure 1, 4 see, with the conditional probability of the chain law to write p, the connection between the G-link can be reduced to a Bayesian network chain Law .

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

Figure 4

Pay particular attention to why there are

P\left ({l\left| {D,i,g,s}\right.} \right) = P\left ({l\left| G \right.} \right) ">

The "non-Principle" described earlier is used here. L in the case of known D, G, I, S, his non-descendant nodes (he also does not have a descendant node) is D, I, S, so directly removed.

This shows that the relationship between probability independent relationship and probability graph is actually one thing.

The naive Bayesian model is described below

This naive Bayes is called (Na?ve Bayes) also called (Idiotbayes ... ）

The main naive Bayesian model 5.

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

Figure 5

All x are conditionally independent, i.e.

\left ({{x_i} \bot {x_j}\left| C\right.} \right), \forall X ">

Easy to get by the chain rule of Bayesian networks

There are 2 types of naive Bayesian models that are often used

Take a sample to show how the two Bayesian models work in each case. Now there is a document that consists of very many words. There are now 2 categories to choose from "Related financial" and "related pets". Now we have to file this article.

One: Bernoulli naive Bayes (Bernoulli Naive Bayes)

Bernoulli naive Bayes 6.

Figure 6

This is essentially a "look-up dictionary"approach. It used cat, dog, buy these as the word in the dictionary.

The reason Bernoulli is because. In this way, there is no word in the dictionary, no matter how many times it appears, in the analysis article. The dictionary entries are only 0-1 of the two distribution random variables.

The probability that the document belongs to these two categories is

Each small product term represents the meaning of "Suppose this is a financial document, the probability that a cat word can appear is 0.001".

why this simple , because it if the entry of each word is not affected by each other.

Second: Polynomial naive Bayes (multinomial Na?ve Bayes)

This way is fundamentally different from Bernoulli's, 7

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

Figure 7

W These units are no longer a dictionary entry, but are the real words in the article to be classified.

If this article wrote 1991 words, then there are 1991 W

The probability that the document belongs to these two categories is still

\frac{{p\left ({C = {c^1}\left|{ {x_1}, \ldots, {x_n}} \right.} \right)}}{{p\left ({C = {c^2}\left| {{x_1},\ldots, {x_n}} \right.} \right)}} = \frac{{p\left ({c = {C^1}}\right)}}{{p\left ({c = {c^2}} \right)}}\prod\limits_{i = 1}^n {\frac{{p\left ({{x_i} \left| {C = {c^1}} \right.} \right)}}{{p\left ({{x_i}\left| {C = {c^2}}\right.} \right)}}} ">

Each small product term represents " suppose this is a financial document." The probability of a cat appearing in a random position in the article is 0.001"this means. Do you look at the watch or the watch? But it's totally different now. Since the probability of Cat+dog+buy+sell is now required to add up to 1. And Bernoulli does not have this restriction , arbitrarily equals how much. This difference is very important.

Why is this Bayesian simple? since it assumes that the probability of cat occurrence in the entire position of the article satisfies the same distribution, the actual apparent impossibility is not good.

Just as "beloved" must always be the beginning of the day. Who will write in the article half to sentence this ...

In short, naive Bayesian is really simple, it can only be used in the case of weak correlation of random variables, but very many cases are actually very weak ... So the naïve Bayes effect surprisingly effective

Naive Bayes is widely used in various fields. It's not going to unfold here. There are quite a few advantages.

Welcome to the discussion and follow the this blog Weibo personal homepage

Reprint Please respect the author's labor. Full retention of the above text and links to the article, thank you for your support!

Probabilistic graphic Model (PGM) learning notes (iv)-Bayesian networks-Bernoulli Bayesian-Bayesian polynomial

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More