The 11th chapter conditions with the airport (conditional random field, CRF) is given a set of input random variables under the conditionsanother set of conditional probability distribution models for output random variables, characterized by the assumption that the output random variable constitutes the MALThe airport is available at Cardiff. The conditional random field can be used for different prediction problems, this chaptermainly about linear chain (linear chain) conditions with the airportin labeling the problemApplication, then the problem becomes the inputA discriminant model of sequence-to-output sequence prediction, in the form of a logarithmic linear model, whose learning method is usually verymaximum likelihood estimation of large likelihood estimation or regularization. 11.1 Probabilistic Graph-free model probabilistic graph model (probabilistic undireoted graphical model), also known as Markovwith the airport (Markov random field) is a joint probability distribution that can be represented by a non-graphic graph.
Model definition
Graph is a set of nodes and edges (edge) that connect nodes. Nodes and edgesThey are recorded as V and E, and the set of nodes and edges are recorded as V and E, respectively, mapped .g= (v,e). The graph of no direction isa figure with no direction on the edge.
The probability map model (probabilistic graphical models) is a probability distribution represented by a graph. There is a joint probability distribution P (y), and Y is a set of random variables. The probability distribution P (Y) is represented by the g= graph (v,e) , i.e. in Figure G, each node V represents a random variable yv, and each edge E represents a probabilistic dependency between random variables.
Given a joint probability distribution P (Y) and the graph G that represents it. First, the definition of the non-graph representationThe pairs of Markov (pairwise Markov property) exist between random variables, local Markov( local Markov properly) and global Markov property.
pairs of Markov nature:Set U and V are any two nodes with no edges connected in the graph G, the nodeuand V correspond to random variables respectivelyYUand theYV, all other nodes are O, and the corresponding random variable group isYO. IntoThe Markov nature refers to a given random variable groupYOunder the condition of random variablesYUand theYVis conditional and independent, i.e.
Local Markov nature: SetV is any node in the graph G without direction, W is connected with V-side .all the nodes,Ois all nodes other than V, W. Represent random variables separatelyYV,as wellrandom variable groupYW andYO. BureauThe Markov nature of a part refers to a givenMachine Variable GroupYWunder the condition of random variablesYVwith a random variable groupYOis independent, i.e.
Global Markov: set the node set a, B is the node in the graph G is divided by the nodes set C, the set of any point of the node, 11.2 is shown. The set of random variables corresponding to the node set a, B and C are yA,yB, and yc respectively. Global Markov refers to a given random variable group yC conditions under the random variable group ya,yB is the bar Independent, That
The above-mentioned paired, local and global Markov definitions are equivalent.
The definition 11.1 (probabilistic graph model) has a joint probability distribution P (Y) represented by the g= graph (v,e) , in Figure g, the nodes represent random variables, and the edges represent the dependencies between random variables. If the joint probability distribution P (Y) satisfies the paired, local, or global Markov, it is said that the probability distribution of the Union is the rate-free graph (probability undirected graphical model), or Markov random field C Markovrandom field).
for a given probability graph model, we want to write the whole joint probability into several sub-joint generalizationrate of the product, that is, the joint probability of the factor decomposition, so as to facilitate the model learning and accountingcount. In fact, the most important feature of the probabilistic graph model is that it is easy to factor decomposition.
Factor decomposition of probabilistic graph model
Definition 11.2 (Regiment and largest regiment)A subset of nodes with edges attached to any two nodes in the graph G.called a regiment (clique). If C is a group without a graph G, well and can no longer be added to any of the nodes of GTo make it a larger regiment, it is called the largest regiment (Maximal clique). Example,
Figure 11.3 shows a graph of 4 nodes. There are 5 groups of 2 nodes in the figure: {Y1, Y2, Y3, Y4},{Y2, Y3},{Y3, Y4},{Y4, Y2} and {Y1, Y3}. There are 2 largest groups of {Y1, Y2, Y3} and {Y2, Y3, Y4}. and{Y1, Y2, Y3, Y4} is not a regiment, becauseY1and theY4There is no edge connection.The joint probability distribution of a probabilistic graph model is represented as a function of the random variable on its largest group .The operation of the product form is called the Factor decomposition (factorization) of the probabilistic graph model. Given the probability graph model, the G,c graph is the largest group on the G, and the YC represents the C pairthe random variable that should be. Then the joint probability distribution of the probabilistic graph model P (Y) can be written in all the largestfunctions on Regiment Cthe product form, i.e.Where z is the normalization factor (normalization factor), the normalization factor guarantees that P (Y) constitutes a probability distribution, and the functioncalled a potential function (Potenrialfunction ), the requirement is strictly positive and is usually defined as an exponential function:
theorem 11.1 (Hammersley-ciifford theorem)joint probability distribution of probabilistic graph modelP (Y) can be expressed in the following form:C is the largest group of graphs without direction,YCis the random variable that corresponds to the node of C,It's a C-fix.The strictly positive function of righteousness, the product of which is carried out on the largest regiment of all the non-graph graphs. 11.2 Definition and form of the random airport
definition of conditional random Airport
Conditions with the airport (conditional random field) is given random variable x condition, random variableMarkov random field of Y. Here the main introduction defines the special conditions on the linear chain with the airport, calledlinear chain conditions with the airport (linear chain conditional random field). in the conditional probability model P (y| X), y is the output variable that represents the tagSequencealso referred to as the sequence of States,x is an input variable that represents the observed sequence that needs to be labeled. When learning, using the training data set through maximum likelihood estimation or regularization of the greatThe conditional probability model is obtained by likelihood estimation, and for the given input sequence x, the bar is calculated.output sequence with the largest probability of a piece.
Define 11.3 (conditional random field) set X and y are random variables, P (y | x) is the bar of the given Xthe conditional probability distribution of y under a piece. If the random variable y constitutes an g= (v,e)the airport can beThe conditional probability distribution P (y|) is established for any node v. X) for the condition with the airport. The W~V in the formula is expressed inin Figure g= (v,e), all nodes connected to node V have W,W! = V for all junctions other than node vPoint,YV,YUwith theYWa random variable that corresponds to a node v,u to W. in reality, it is generally assumed that x and Y have the same graph structure. linear link conditions with the airportthe situation is
in this case, the largest group is the two adjacent nodescollection. As shown
definition 11.4 (linear chain conditional random field)Set x= (x1,x2,..., Xn), y=(Y1, Y2,..., YN)A sequence of random variables represented by a linear chain, if the sequence of random variables is given under the condition of the series X of a random variablethe conditional probability distribution of column y p (y I X) constitutes the condition with the airport. That satisfies the Markov natureThe P (Y I X) is called the linear chain condition with the airport.
Parametric form of conditional random airport
i.e. factor decomposition, each factoris a function that is defined on two adjacent nodes.
theorem 11.2 (parametric form of linear chain conditions with the airport)Set P (y}x) for linear chain conditions withat the airport, if the random variable x is x, the conditional probability that the random variable y takes the Y value hasThe following form:, TK and SL are characteristic functions,and theuLis the corresponding weight value. Z (x) is a normalization factor and summation isOn all possible output sequences :The upper formula is the basic form of the linear chain conditional random field model, which indicates the given inputsequence x, the conditional probability of predicting the output sequence Y. TKis defined on the sidefeature functions, called transfer features, depend on the current and previous position,sLis defined at the junction.a feature function, called a state feature, depends on the current position. Both are dependent on the location and are local featuresfunction. Typically, feature functionsTKand SLThe value is 1 or 0, and the value is 1 when the feature condition is met, otherwiseto 0. The condition with the airport is determined by the characteristic function and the corresponding weight value completely.
Simplified form of conditional random airport
you can sum the same feature at each location and transform the local feature functionas a global feature function, so that the condition can be written as an inner product of a weight vector and a eigenvectorform, which is the simplified form of the condition with the airport. First, the transfer features and state features and their weights are represented by a uniform symbol. With K1features, K2 state features, k=K1+ K2,, rememberThen, the transfer and state characteristics in each position I sum, recorded as the corresponding weights, the condition with the airport, in vector form, the condition with the vector form of the airport as
The matrix form of the condition with the airport
citedEnter a special start and end state Mark Y0=start, yN+1=stop. for each position I of the observed sequence x, define an M-order matrix (M is the Mark Yinumber of values to be taken)The conditional probability isNote thaty0=start, yN+1=stopindicates the start state and the terminating state, Zw (x) isstop at start as the end point through all paths of the state y1,y2 the non-normalized probability of the,..., ynthe sum. 11.3 The probability calculation problem of random airports with the airport the probability calculation problem is given the condition with the airport P (yix), the input sequence x and the outputsequence y, which calculates the conditional probability p (yi=yi | x), p (YI-1=yI-1,YI=yI | x) and the corresponding mathematical expectationsthe problem.
Forward-to-back algorithm
For each indicator i = 0,1,..., n + 1, the definition of the forward vector ai (yi | x) indicates that the mark in position I isyIand to the first part of the position I mark the non-normalized overview of the sequenceRate,yIthe desirable values are M, soaIis the M Willi Vector. Defines a back-to-vector representation of the mark in position I isyIand the non-normalized generalization of the subsequent mark sequence from I+1 to nRate. can be defined according to the forward-back vector, it is easy to calculate the marker sequence at position I is the markeryIthe conditionsThe rate and in the position i-1 with I is the MarkyI-1and theyIthe conditional probabilities:Using forward-back vectors, you can calculate the feature function about the joint distribution P (x, y) and the conditional distributionthe mathematical expectation of P (Y I X). For a given observation sequence x and the marker sequence y, you can passone forward scan and one back scan to calculate all the overviewsrate and characteristics of expectations. 11.4 Learning algorithms for random airportsThe conditional random-airport model is actually a logarithmic-linear model defined on time series data, and its learningThe method includes maximum likelihood estimation and regularization of maximal likelihood estimation.
An improved iterative scale method
The model parameters are obtained by the logarithmic likelihood function of the maximal training data.
The log likelihood function of the training data of the conditional random field model is
The improved iterative scale method continuously optimizes the lower bound of the logarithmic likelihood function by iterative method, and achieves the purpose of maximal logarithmic likelihood function. Derivation is available,the updated equation about the transfer feature isThe update equation for the state feature is t (x, y) is the sum of all the features that appear in the data (x, y):
Quasi-Newton method
For conditionswith the airport model,the optimization objective function of learning isIts gradient function is 11.5a prediction algorithm for conditional random airportsThe conditional random field prediction problem is the given condition with the airport p (Y | x) and the input sequence (observation sequence) x,The output sequence (marker sequence) with the most conditional probability is y*, that is, the observed sequence is labeled. According to the vector form of the conditional random field, the prediction problem of the conditional random field becomes the optimal path problem with the most non-normalized probability, which is solved by the Viterbi algorithm.
From for notes (Wiz)
Statistical learning methods Hangyuan Li---The 11th chapter conditions with the airport