1.1 Introduction

Relational data has two characteristics: 1 The probability of the entity being modeled is not independent. 2 entities to be modeled often have many features that can help categorize. For example, when you categorize a page, the text information on the page can provide a lot of information, but hyperlinks between pages can also help categorize. [Tasker et al., 2012]

Graph model is a natural form of using the probability dependence between entities, usually, the graph model is used to represent the joint distribution P (x, y), Y is the expected prediction, X represents the visible ready to be modeled entity, similar to the mechine learing input x (i) and the class Mark Y, However, it is difficult to model the joint distributions of the non-independent variables, because the P (x) is modeled on the multiplication formula P (Y-X) = P (x) *p (y|x), and because X is not independent, the probability dependency is taken into account, and the entity X cannot be considered independent as naviebayes. This would lose a lot of information that would benefit the classification.

The solution to these problems is to directly model P (X|Y), which is sufficient for classification, which is the method used by CRF. The CRF is a conditional distribution P (x|y) that can be represented by a graph model, which avoids modeling x, for example, in the NLP field, useful features include adjacent words, Bi-grams (for example: Front word & Word) prefix, case, domain Word, Or semantic information from the word Web. Recently CRFs in text processing, biology, computer vision is very hot.

Article structure:

1. Introduce the recent training and use of CRF, introduce CRF special case Liner-chain CRF, and associate these models with graph models

2. General CRF applications, such as information extraction, can be linked to unstructured text. General CRF can capture dependencies from very distant entities, unlike Liner-chain CRF

3.

1.2 Graph model

1.2.1 Definition

For a set of random variables v=x ∪ y, suppose X is a set of observable input variables, and Y is a series of outputs we want to predict. For a random variable value Sequence V ', is a series of random variables of the value V (v∈v), the V can be discrete or continuous, here only to discuss the discrete situation.

About symbol Representations:

Possible values of random variable x do X

The possible value of random variable a⊂x is XA

1{x=x '} indicates that when X=x ' is established, the value is 1, otherwise the value is 0

A graph model is the probability distribution family after the decomposition of graphs. The main idea of the graph model is to express the distribution of some random variables, which are generated from local functions that depend only on and few variables. Given a set a⊂v, given a graph model, its distribution can be written as:

F={φa} in 1.1 in labeled φa are = vn->r+ (function set F also known as local functions, or compatible functions), accidental use of the airport to describe the distribution of the graph model, declare that the term "model" in this article is often used to distribute the probability of the family, and the airport is described in the distribution of a member of the family. The constant z in Equation 1.1 is a regularization parameter, calculated as follows:

Z is often difficult to calculate, but there are many ways to approximate it.

For equation 1.1, it can be expressed in the form of a factor graph. The factor graph is two g= (v,e,f), where if the variable V is the parameter of the function labeled Φa are, the variable node vs∈v is connected to the factor node φa∈f in the diagram. Figure 1.1 The right is the form of a factor graph, the graph of the circle node is a variable node, the square node is a factor node

In this article, we assume that each local function is as follows:

Θa is a real-valued parameter, and {fAk} is a set of feature functions that ensures that the distribution family of V that is modified by the parameter θ is an exponential distribution family, and most of the models discussed in this article belong to the exponential distribution family.

Without the graph model, the Bayesian network is based on the g= model (V,E), and the graph model can be represented by the following formula:

∏ (v) is the parent node of V in Figure G, shown in 1.1.

The term "build model" describes an output variable topology in an x∈x graph that produces an input variable, which means that, without a y∈y, it can be the parent node of the output variable, in fact, the generation model describes exactly how the output variable is generated by probability from the input variable.

Application of 1.2.2 Graph model

This chapter will elaborate on some applications in the NLP field, and will also involve hmm because it is closely related to Liner-chain CRF.

1.2.2.1 classification

First, for the classification problem, given input {x} = (x1, x2, ..., xk), the simple solution is to assume that {x} is independent, and that the class label is known, as the Navie Bayes classifier is dealing with:

This model can be represented in Figure 1.1 to the left of the form

Getting Started with the airport (Conditional Random fields)