Before forgetting to emphasize the important difference: the difference between the conditional probability of the chain rule and the chain law of Bayesian networks
Conditional probability chain law
P\left ({d,i,g,s,l} \right) = P\left (D \right) p\left ({i\left| D \right.} \right) P\left ({g\left| {d,i} \right.} \right) P\left ({s\left| {d,i,g} \right.} \right) P\left ({l\left| {d,i,g,s} \right.} \right) ">
Bayesian network chain rule , 1
Figure 1
At first glance, it is very easy to think that the Bayesian network chain law is not what we have learned the chain law, in fact, otherwise , detailed later.
The previous talk about the factorization of probability distributions
The ability to see the independence of conditional probabilities can be seen directly from the probability distribution expression .
We have used the probability graph model to represent the probabilistic relationship with the graphical G, the independence can be seen directly from the diagram?
Of course. The previous talk has explained the flow relationship of probabilities in probability graphs.
When G is known, the probability abilities between S and D are mutually affected.
The following defines a concept that relies on isolation.
Dependency Isolation (d-separation)
In the case where z is known, there is no path between x and Y.
is called X and y dependent isolation. Recorded as
Introduce a theorem:"The graph does not pass on the independent theorem" (Of course, for good understanding)
The theorem is that. If the probability plot satisfies the dependent isolation
The x and Y conditions are independent
To prove that, today is the Bayesian network chain-law . 2
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">
Figure 2
The trick used to divide the sum. Note here that the starting sum of the foot mark is G, I, L
Now divided into 3 parts L and G of the sum of course equals 1, but part I is not, the sum of the part is S, and the sum of the foot is I, so that can not continue to merge .
It's just that we recall the independent equivalence condition before the last one is saying:
P\left ({x, y, z} \right) \propto {\phi _1}\left ({x,z} \right) \phi \left ({y,z} \right) >
So it's done. found that D and S are still independent. This proves that "the graph does not pass the independent theorem".
So can't help asking, what situation does not make sense?
The first conclusion: when the parent node is known, the node is not connected to a node other than the descendant node.
It's called "the principle of non-rule."
That's a good word, just look at the picture, 3.
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">
Figure 3
We use letter nodes as examples. His parent node is grade, his descendants are job and happy, so he and the rest of the SAT, Intelligence, difficulty, coherence do not pass.
Rough analysis, the ring above the walk is due to the grade known, the following is because the job does not know. The principle of analysis has been described in detail.
Define an IMAP
Since the diagram does not work independently, assuming that this does not pass the graph G corresponding probability distribution is p, we will call G is P i-map (Independencymap).
Suppose that the independent probability distribution P can be decomposed according to a graph G. So G is the IMAP of P.
In turn, assuming that G is an IMAP of probability distribution p, then p can be decomposed according to G.
So there are 2 equivalence points in the probability map.
1. Probability graph G is used to represent the probability distribution p.
The 2.P is used to express the independent relationship shown by the probability graph G.
Proving the probability plot and probability distribution is one thing.
First write the conditions in Figure 1, 4 see, with the conditional probability of the chain law to write p, the connection between the G-link can be reduced to a Bayesian network chain Law .
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">
Figure 4
Pay particular attention to why there are
P\left ({l\left| {D,i,g,s}\right.} \right) = P\left ({l\left| G \right.} \right) ">
The "non-Principle" described earlier is used here. L in the case of known D, G, I, S, his non-descendant nodes (he also does not have a descendant node) is D, I, S, so directly removed.
This shows that the relationship between probability independent relationship and probability graph is actually one thing.
The naive Bayesian model is described below
This naive Bayes is called (Na?ve Bayes) also called (Idiotbayes ... )
The main naive Bayesian model 5.
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">
Figure 5
All x are conditionally independent, i.e.
\left ({{x_i} \bot {x_j}\left| C\right.} \right), \forall X ">
Easy to get by the chain rule of Bayesian networks
There are 2 types of naive Bayesian models that are often used
Take a sample to show how the two Bayesian models work in each case. Now there is a document that consists of very many words. There are now 2 categories to choose from "Related financial" and "related pets". Now we have to file this article.
One: Bernoulli naive Bayes (Bernoulli Naive Bayes)
Bernoulli naive Bayes 6.
Figure 6
This is essentially a "look-up dictionary"approach. It used cat, dog, buy these as the word in the dictionary.
The reason Bernoulli is because. In this way, there is no word in the dictionary, no matter how many times it appears, in the analysis article. The dictionary entries are only 0-1 of the two distribution random variables.
The probability that the document belongs to these two categories is
Each small product term represents the meaning of "Suppose this is a financial document, the probability that a cat word can appear is 0.001".
why this simple , because it if the entry of each word is not affected by each other.
Second: Polynomial naive Bayes (multinomial Na?ve Bayes)
This way is fundamentally different from Bernoulli's, 7
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewnozw5nx3nqdhu=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">
Figure 7
W These units are no longer a dictionary entry, but are the real words in the article to be classified.
If this article wrote 1991 words, then there are 1991 W
The probability that the document belongs to these two categories is still
\frac{{p\left ({C = {c^1}\left|{ {x_1}, \ldots, {x_n}} \right.} \right)}}{{p\left ({C = {c^2}\left| {{x_1},\ldots, {x_n}} \right.} \right)}} = \frac{{p\left ({c = {C^1}}\right)}}{{p\left ({c = {c^2}} \right)}}\prod\limits_{i = 1}^n {\frac{{p\left ({{x_i} \left| {C = {c^1}} \right.} \right)}}{{p\left ({{x_i}\left| {C = {c^2}}\right.} \right)}}} ">
Each small product term represents " suppose this is a financial document." The probability of a cat appearing in a random position in the article is 0.001"this means. Do you look at the watch or the watch? But it's totally different now. Since the probability of Cat+dog+buy+sell is now required to add up to 1. And Bernoulli does not have this restriction , arbitrarily equals how much. This difference is very important.
Why is this Bayesian simple? since it assumes that the probability of cat occurrence in the entire position of the article satisfies the same distribution, the actual apparent impossibility is not good.
Just as "beloved" must always be the beginning of the day. Who will write in the article half to sentence this ...
In short, naive Bayesian is really simple, it can only be used in the case of weak correlation of random variables, but very many cases are actually very weak ... So the naïve Bayes effect surprisingly effective
Naive Bayes is widely used in various fields. It's not going to unfold here. There are quite a few advantages.
Welcome to the discussion and follow the this blog Weibo personal homepage
Reprint Please respect the author's labor. Full retention of the above text and links to the article, thank you for your support!
Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.
Probabilistic graphic Model (PGM) learning notes (iv)-Bayesian networks-Bernoulli Bayesian-Bayesian polynomial