Bayesian NetworksOrder
On the weekend after writing the plain Bayesian classification, attached to the seven-day class, and four days are nine o'clock night, has not much time to learn Bayesian network, so the update slow point, the use of Qingming Festival two days holiday, spent about seven or eight hours, wrote this blog, Here is an example of what the previous naïve Bayesian has said, and the rest is from the Bayesian network introduction. I will write in an easy-to-understand way, not to speak very complex, will introduce the majority of Bayesian network knowledge points, reading will let you have a general understanding of Bayesian network. But for something deeper, I'm not going to write it first. For example, training Bayesian network , because it involves a more in-depth mathematical knowledge, I myself is not a very thorough understanding, so first do not write.
These days use leisure time, watched a TV series, STB Super Teacher, this TV show although there is no star, but personally think it is pretty good, more nutritious, is a comedy and inspirational film. The storyline is probably a poor family of children, in high school did not read the black market, more than 10 years later, the actor and several brothers in the Underworld played a day, but the male lead regrets did not read, for the day feel tired of killing, went to a high school as a class teacher, But this class is the whole school's garbage class, class teacher has changed 16, some students have to go to the mental hospital. Although at the beginning also met the students of the plot, but in his unremitting perseverance, the impact of all students. Because the male lead has not received any formal education, no teacher Qualification certificate, was some teachers crowding out, asked him to leave school, and finally in the male lead constantly insisted and the classmates help, admitted to the Teacher qualification certificate.
There are a few words of the male protagonist: mutual encouragement.
Growth is to make you a person, stumbling wounded, stumbling strong
You are not born to test results and live, that proves nothing, what is really important is that you have to live well, to find a real answer for yourself.
No matter what kind of person, as long as the intentions, willing to work hard, can achieve their goals
IntroIn front of the naïve Bayesian, naive Bayes has a premise that
each random variable (characteristic attribute) is independent of each other .
But in real life, the situation we encounter is not all random variables are independent of each other, they are interrelated, mutual influence.
In the previous article, we talked about an example of using naive Bayes to detect fake and false accounts.
The judgment of the true and false account is judged according to three characteristic attributes:
F1: Number of logs/days of registration
F2: Number of friends/registration days
F3: Whether to use real picture (real avatar is 1, non-real avatar is 0)
by calculating this equation: P (f1| C) P (f2| C) P (f3| c) P (c) to determine whether an account is true or false.
But our premise is to assume that these three characteristics are independent of each other.
Now we find the correlation between the three feature attributes
1, the real account than the non-real account average has a greater log density, the density of each big friend and more use of real avatar
2, log density and friend density, log density and whether the use of real avatar in the account authenticity given the conditions are independent
3, the use of the real picture of the user than the use of non-real avatar user average has a greater friend density
Due to the existence of dependency between characteristic attributes, naive Bayesian classification can not solve this problem, see the following solution:
is a direction-free graph in which each node represents a random variable, whereas an arc represents a connection between two random variables, representing the parent node affecting the child node
However, the above figure only gives a qualitative analysis of the relationship between the random variables (characteristic attributes), if you want to quantify, also need some data, these data is each node to its direct precursor node of the conditional probability, and no precursor node is the use of a priori probability representation. This data can be obtained through training, such as the probability distribution of using a real avatar in the case where the known account is true or false.
The vertical header represents the condition variable, and the horizontal header represents the random variable. The above table is the probability of real account and non-real account, and the following table is the probability of avatar authenticity for account authenticity. These two tables are the conditional probability table for "whether the account is true" and "Avatar is true". With this data, we can deduce it not only by inference, but also by Bayesian theorem. For example, an account is randomly drawn, and it is known that its avatar is false, and the probability of its account being false is:
That is, in the case of only knowing that the avatar is false, there is about 35.7% probability that the account is also false.
The denominator P (h=0) in the above calculation is used to deform the full probability formula.
Look at Baidu Encyclopedia above the definition of full probability formula
If the event b1,b2, ... To form a complete event group with positive probabilities, the following formula is established for any event a:p (A) =p (Ab1 ) +p (AB 2 ) +...+p (Abn ) =P (A | B1 ) P (B1 ) + P (a| B2 ) P (B2 ) + ... + P (a| Bn ) P (Bn ). This formula is the full probability formula.
In particular, for any two random events A and B, the following is true:
If a table of conditional probabilities for all nodes is given (which can be obtained through sample training). And the relationship between the nodes, you can calculate the probability of each node occurrence. The above method is the use of Bayesian networks.
uncertainty inference and joint probability distribution
The use of probabilistic methods for uncertainty inference is:
1, the problem with a set of random variables X={x1,x2,...,xn} to portray;
2, the knowledge of the problem is expressed as a joint probability distribution P (X);
3, according to the principle of probability theory reasoning calculation.
Let's look at an example
Alarm question : Professor Peari lives in Los Angeles where earthquakes (E) and Theft (B) occur, the professor's home is equipped with an alarm bell (A), earthquakes and thefts can trigger an alarm bell, and two neighbors Mary (M) and join (J) may call him when they hear the alarm. One day, Professor Peari received a phone call from Mary, saying that Professor Pearl wanted to know the probability of his family being robbed when he heard the alarm ringing.
Follow the steps above:
Describe the problem with a set of random variables X={x1,x2,...,xn};
The problem consists of 5 random variables: Theft (B), Earthquake (E), Alarm (A), receiving John's phone (J), and receiving a call from Mary (M);
All variables are evaluated as Y or N. There is uncertainty about the relationship between the variables: theft and earthquakes occur randomly at a certain probability, and they do not necessarily trigger alarms after they occur, and Mary and John may not hear alarms for some reason, such as listening to rock music and hearing problems, and sometimes The two also mistakenly listen to other sounds as police bells. Assuming that Professor Pearl's assessment of the joint probability distributions P (b,e,a,j,m) for these 5 variables is shown in table 2.1, it is calculated that after receiving the call from Mary (M=y), Professor Pearl's confidence in Home theft (b=y) is calculated (b=y| m=y).
Based on the joint probability distribution P (b,e,a,j,m), the edge distribution is calculated first.
The difficulty of using the joint distribution calculation to make uncertainty inference is obvious, which is that it has high complexity. For example, there are 5 two-valued random variables, and the entire federated distribution contains independent parameters: 1+2+4+8+16=31. Generally, the joint probability distribution of n two-valued variables contains (2 n-TH-1) Independent parameters, so the complexity of the joint distribution increases exponentially with respect to the number of variables. When variables are large, storage and computation are difficult.
Well, there is no way to solve it, there must be. is to reduce the complexity of the joint distribution. Then look at the following content.
decomposition of conditional independence and joint distributionThe five random variables of the alarm problem mentioned above are indeterminate, which increases the computational complexity. However, by using the conditions of the variables, the joint distribution can be decomposed into multiple probability distributions with lower complexity, so as to reduce the complexity of the model expression.
Originally, we could represent the joint probability distribution P (b,e,a,j,m) as
P (b,e,a,j,m) =p (B) p (e| B) P (a| B,e) P (j| B,e,a) P (m| B,E,A,J)
This formula says that there are 31 independent parameters, the calculation is more complex.
but we can know by background:
earthquakes should have nothing to do with theft. So it is assumed that E and B are independent of each other. that P (e| B) =p (E). put P (e| B) is decomposed into P (E).
Whether John and Mary will call directly depends on whether they hear the alarm. It can be assumed that J and B and E, as well as M and J, B, and e are independent of each other when given a, p (j| B,e,a) =p (j| A) and P (m| B,E,A,J) =p (m| A
By substituting these independent assumptions into the former joint distribution company, you get:
P (b,e,a,j,m) =p (B) p (E) p (a| B,e) P (j| A) P (m| A
The new formulas are decomposed into the product of several probability distributions with lower complexity. Contains only
1+1+4+2+2=10 a separate parameter.
The relationship between the above variables is represented by a single, non-circular graph. The dependencies between variables are represented by arrows of edges and edges. B->a indicates that a depends on B, and then the probability distribution of each graph node (random variable) is obtained by training the sample. Get:
the construction of Bayesian networksTwo steps: Determining network topology and determining network parameters
determine the network topology
1. Select a set of random variables describing the problem {x1,x2,... Xn};
2. Select a variable order a={x1,x2,... Xn};
3, starting from an empty diagram, according to order a variable into the network diagram
4, when the variable XI is added, the variables in the network diagram include x1,x2,..., X (i-1);
4.1, using the background knowledge of the problem, select a subset C (xi) as small as possible in these variables, so that given this subset of C (xi), Xi and other variable conditions in the network diagram are independent and reasonable.
4.2. Add a directed edge to Xi from each node in subset C (XI)
If you don't see it very well, look at the example below, or an example of the alarm problem above
The problem involves five random variables, assuming that a Bayesian network structure is constructed with sequence A1={b,e,a,m,j}, the process is as follows:
1, first put B into null, get figure A
2. Then add E: Assume that B and e are independent of each other, C (e) = Empty set, so no additional edges are required, result b
3, then add a: we assume that a depends on both B and E, so C (a) ={b,e}, then draw a line from B and E to a side, get figure C
4. Add M: Assuming that the given A,m and B, e conditions are independent, so C (m) ={a}, so draw a line to the edge of M, get figure D
5, finally add J: Assuming that the given A,j and B, E and M are independent of each other, so C (j) ={a}, draw an edge from a to J, get the figure E
Different sequences, you get different network structure diagrams: Usually we determine the order of variables by causal relationships.
For example, the above E (earthquake) and B (theft) is a (alarm) direct cause. So there are e->a,b->a these two sides, a (alarm bell) is the direct reason for Mary and John to call, so there are a->m, and a-j. The network diagram can be constructed by causal relationships.
Determine network parametersBayesian network parameter is the probability distribution of each variable, which is usually obtained by training sample statistics.
But sometimes it can be directly from the characteristics of the problem, see an example
Stallion Farm : Consider the genetic relationship between a stallion, a mare, and the offspring of a stud farm. Assuming that gene A is a recessive pathogenic gene, the corresponding dominant gene is a, and when there is no information, the genotype of any horse about the disease may be one of the following three.
AA (Sickness), AA (Carrier), AA (normal). According to Genetic genetics, it is possible to directly determine the probability relationship between the genotype Gc of any horse and its parent's genotype GP and GM P (gc| GP,GM). When the stallion is AA, if the mare is a AA, then the offspring may only be AA, so its distribution is (1,0,0);
If the mare is AA, then the offspring cannot be AA, if the mare is AA, then the offspring must be AA.
The distribution of the various conditions is as follows table
three structural forms of Bayesian networksconsider the basic case where the two variables A and b are indirectly connected by 3 variable C, which can be divided into 3 sub-forms:
Shun Lian, sub-connected, sink-connected Shun Lian
There are: P (a,b,c) =p (a) *p (c|a) *p (b|c).
After simplification, you will be able to:
That is: in the case of C known, the knowledge of a does not affect the reliability of C, and thus does not affect the reliability of B, the information channel between A and B is blocked, blocked (blocked), A, B is independent of each other . However, in the case of unknown C, the knowledge of a affects the reliability of C, which in turn affects the reliability of B.
< Span style= "font-size:18px" >
In the alarm network diagram, there is a straight-link structure from theft (B) to alarm (A) to Mary (M):b->a->m.
Without knowing the status of a, receiving a call from Mary increases the reliability of the alarm, thereby increasing the reliability of the theft. Conversely, if the theft is told, it increases the reliability of the alarm, which increases the reliability of the call to Mary.
On the other hand, if the alarm is closed in the morning, it is known that the theft does not affect the reliability of the alarm, so it does not increase the reliability of the phone call to Mary. In the same vein, knowing that Mary was on the phone did not increase the credibility of the theft because the alarm was turned off, so it would be thought that Mary might have listened to too much rock music to create a phantom hearing.
So at unknown A, B and M are interrelated, while at known a, B and M are independent of each other.
Sub-connected
have P (a,b,c) =p (c) *p (a|c) *p (b|c), then: P (a,b|c) =p (a,b,c)/P ( c), then bring the P (a,b,c) =p (c) *p (a|c) *p (b|c) into the upper type, resulting in: P (a,b|c) =p (a|c) *p (b|c).
in the case where C is unknown, the information can be , b Pass between, they are related, in C known circumstances, information can not be passed between a, B, they are blocked, is blocked, a, B is independent of the
In the alarm Network diagram, after the alarm (A), Mary (M) and John (J) are likely to call, namely the structure of the connection: M<--a-->j
If you receive a call from Mary, the reliability of the alarm will increase and you will expect John's phone to be further. So,
when unknown A, M and j are interrelated. But if you knew beforehand that the alarm had been turned off, there would be no such reasoning. That is, when a is known, the M and j conditions are independent.
Hui Lian
So there is: P (a,b,c) = P (a) *p (b) *p (C|A,B) is established, after simplification can be:
The connection is exactly the opposite of the previous two cases, and a and b are interrelated when C is known. In the case of C unknown, a, B is blocked (blocked), is independent. (For an inappropriate example, your parents and you, before you were born, they were independent, after you were born, they were connected by you, no longer independent of each other)
In the alarm network diagram, the Theft (B) and the earthquake (E) will ring the alarm bell (A), resulting in a sink structure: e-->a<--b.
When a is unknown, B and e are independent of each other, knowing that an earthquake does not change the reliability of the theft being sent.
But if you know the alarm bell, if it is known that there is an earthquake, the alarm bell has a reasonable explanation, so that the credibility of the theft will be reduced, in turn, if the theft occurred, the alarm bell also has a reasonable explanation, so that the reliability of the transmission of the earthquake will be reduced,
So in a known case, B and e are interrelated, and in the case of an unknown, B and e are independent of each other. The opposite of the previous two cases.
Well, it's written here today, it's already dark, and I haven't eaten yet. The next decision tree algorithm that will write data mining includes ID3 and C4.5
Code word is not easy, reprint please specify source http://blog.csdn.net/tanggao1314/article/details/69055442
Reference: Introduction to Bayesian networks
Theory and concept of data mining
The beauty of mathematics
Http://www.cnblogs.com/leoo2sk/archive/2010/09/18/bayes-network.html
http://blog.csdn.net/zdy0_2004/article/details/41096141
Bayesian Network of data mining algorithm