Weekend to colleagues to share the Bayesian network, every time after the sharing of things are not recorded feel very pity, so the preparation of the sharing process of some notes, materials, key points written down as articles.
1. Definition of Bayesian networks
A Bayesian network is a directed acyclic graph (DAG) whose nodes represent a variable, and the edge represents the relationship between variables, and the node stores the conditional probability distribution of the node equivalent to its parent node.
Each of these nodes is affected by its parent node, that is, its parent node represents the reason, and the child node represents the result.
The mathematical description is that the joint probability distribution of each variable in the Bayesian network equals the product of each node with its parent as the conditional probability.
That
2. Derivation of Bayesian networks
The derivation of Bayesian networks is the answer to all possible probability problems in the Bayesian network, in which, for example, it is possible to answer arbitrary probability problems such as P (x2=0), P (x3=1|x2=0), P (x2=0,x3=1,x4=0).
(1) Precise derivation
A simpler Bayesian network can use the exact derivation method. According to the structure of Bayesian network, we can find the joint probability distribution, then we can introduce the probability form of any on this network according to the full probability formula and Bayesian formula.
Such as Bayesian networks are as follows:
The derivation of the probability problem is as follows:
The exact derivation can be optimized using dynamic programming in the calculation process (such as elimination method), or it can be optimized based on the knowledge of some graph theory (such as a group-based derivation method).
(2) Fuzzy deduction
Sometimes the Bayesian network is too large to use fuzzy derivation.
There are many ways of fuzzy derivation, and here is how to use the Gibbs sample in MCMC (Markov chain Monte Carlo) to derive.
I, sample
The sample is composed of observation data and unknown data, namely x1,x2,?, X3,?.... xn, in which the unobserved data is expressed, the purpose of inference is to find the unknown node under the observed value of the probability distribution, that is P (? | X1,x2. xn).
II, Markov blanket
Markov Branket in Bayesian networks refers to the parent node of a node x, the child node, the parent node of the child node (excluding itself), and the Markov blanket in MB (x) to represent node X in the following description.
III. Algorithm Flow
Initialization: Initializes the conditional probability distribution of the unknown variable, sampling it according to the distribution and assigning values to the unknown node.
(1) Random selection of unknown nodes
(2) According to the condition probability distribution of the unknown node, the node is assigned a value.
(3) Recalculate the node's distribution P (?) =p (? | MB (?))
(4) Returns the iteration (1) until it converges.
3. Training of Bayesian networks
(1) The structure is known, the sample is complete
Use the method of maximum likelihood estimation (if discrete values use statistical methods) to get the conditional probability distribution of each node.
(2) The structure is known and the sample is incomplete.
If there is a node that cannot be observed (that is, the sample is incomplete), you can use the EM method to train, the approximate process is as follows:
Initialization: Conditional probability distributions for random nodes
E-step: According to the existing conditional probability distribution of each node, complement the sample (if the continuous complement is the mean, the discrete complement is the highest probability of occurrence of the value)
M-step: A new probability distribution of each node is obtained by using the maximum likelihood estimate or statistic according to the "complete" observation value, replacing the original value.
(3) Unknown structure
There are roughly three ways to get the Bayesian network structure:
I, by the expert modeling.
II. Using the correlation-based network training method
The general idea is to calculate the correlations of each variable (such as mutual information, chi-square test, etc.), then establish the edge between the nodes with large correlations, and then determine the direction of the edges by the degree of the sample fit.
Iii. Scoring-based approach
First establish the scoring function, such as MDL, a scoring function to describe the good or bad of a Bayesian network, usually consider the network structure (the simpler the better) and the degree of fitting with the sample (the larger the better fit).
Secondly, a heuristic algorithm (such as simulated annealing) is used to retrieve the whole network structure space, and a local optimal value is searched as the result of the algorithm.
Bayesian Network Summary