Factor(A function/table) is the fitness of a combination of variables (the scope of the factor. In bn, factor is conditional probability distribution (CPD); but factor does not always correspond to a certain probability (of course, it is not necessarily 0 ~ 1), for example, in MRF. Similar to the operation of a database table, the basic operations on a factor include factor product, factor marginalization & Factor Allocation ction.
In practice, the most common model is probably a model with shared structure and shared parameters. For example, the sequence model in NLP is used as an example of Named Entity recognition, the parameter type of entity associated with the latent variable is of independent of the place and the sequence (assuming that position in the sequence is irrelevant to the parameter). The benefit is:
- Reuse of parameters
- Allow us to apply the same model to sequences of varying length
Template modelsAre operating ages that specifyHow variables inherit Dependency ModelFromTemplate(Representation that allows us to solve multiple problems using the same exact model)
========================================================== ======
Introduction to online Bayesian probit regression-factor Graph
Factor graph is a probability graph. There are many probability graphs. The most common ones are Bayesian Network (Bayesian Network) and Markov Random Fields (Markov Random Field ). In the probability graph, finding the edge distribution of a variable is a common problem. There are many solutions to this problem. One of them is to convert the Bayesian Network and the Markov Random Fields into the facor graph, and then use the sum-product algorithm to solve the problem.
Based on the factor graph, the sum-product algorithm can be used to efficiently calculate the edge distribution of each variable.
The sum-product algorithm, also known as belief propagation, has two types of messages: a message from a variable to a function (that is, a message from a square to a circle): M: X → F. The other is the message from the function to the variable: M: F then xmf → x
========================================================== ===
About factor Graph
The so-called factor graph (factor graph) represents the decomposition of function factors. It generally contains two types of nodes: Variable nodes and function nodes. We know that a global function can be decomposed into the product of multiple local functions. It can be simply decomposed. These local functions and corresponding variables can be reflected in the factor graph. For example, I now have a global function whose factorization equation is
Then we will write it as a convenient representation.
The corresponding factor graph is as follows:
The graph is equivalent:
In a factor chart, all vertices, or variable nodes, or function nodes, represent the functional relationships between them by edges. When explaining Naive Bayes and Markov models, we change the online labeled operator number, that is, the PSI function represents the symbol, that is, the contact function between x and y in our model. PSI functions have different meanings in different environments, so it is always tricky to explain such a function. In a dynamic model, or any other graph probability model, it can be expressed by a factor chart. Psi represents probability or conditional probability. Factor graphs and PSI function notation are commonly used in Machine Learning paper.
Potential Function explanation:
Note: At the request of some friends, let's talk about the potential function. In this case, we can regard the potential function as the normalized decomposition factor of the joint probability density. The scope of the potential function is maxima clique ). Every friend who has learned graph theory should have heard of the Clique concept. For a given graph, G = (V, E ). Among them, V = {1 ,..., N} is the vertex set of graph G, and E is the edge set of graph G. The group of graph G is a set of vertices with edges between them. If a group is not included by any other group, that is, it is not the true subset of any other group, it is called the extremely large group of graph G (maximal clique ). The largest vertex is called the maximum clique of graph G ). For more information, see Wikipedia. In fact, theoretically, if it is not a chained CRF, the potential function represents every group in a graph, not the largest group.
From an academic point of view, a potential function is a non-negative real-value function that represents the corresponding clique State, representing the State of the Clique. For example, for a Markov network, the joint probability distribution can be viewed
Here, x (k) indicates the state of the K clique, or the state of the variable that appears in this clique.
For each clique in a graph, it has a state, which is expressed by the potential function. The State is composed of the weighting and composition of multiple feature, because a clique contains multiple nodes, each node and its corresponding random variables correspond to a feature. As mentioned later in the article, we will use the simplest binary model as the feature for each point in our analysis.
Yes, this is the factor diagram of the Hidden Markov Model.
========================================================== ==========