July Algorithm-December machine Learning online Class -18th lesson Notes-Conditional random airport CRF
July algorithm (julyedu.com) December machine Learning Online class study note http://www.julyedu.com
1, logarithmic linear model
The probability of an event is odds, which is the ratio of the occurrence of the event to the probability that the event does not occur.
1.1 General form of the logarithmic linear model
Make X a sample, Y is the possible mark of X, and the feature of the Logistic/softmax regression
selection of feature functions : eg: natural language processing
1, the characteristic function can be almost arbitrarily selected, even the characteristic function overlaps;
2, each characteristic is related to the current part of speech , at most only with the word of speech of the adjacent words
3, but the feature can be all word related (doing so can turn the model into a chain)
POS Labeling
1, structured forecasts.
2, the tags of adjacent words affect each other, not independent
2, linear condition random field with Airport 2.1 linear conditions can use logarithmic linear model.
Given the parameters, how to estimate the probability
Use a sequence representing n words, representing the corresponding part of speech
is made up of a number of sub-characteristics
2.2 Parameter Training
Two difficulties in parameter inference
1, if given X and W, how to calculate which tag sequence y is the most likely
2, if given X and W, how does P (y|x,w) itself calculate?
2.3 State Relationship Matrix
Features can be replaced with sums of this feature
2.3.1 using forward scoring to select the maximum marker sequence
is a forward score that indicates that the K-word is marked as the maximum score of V (which is the probability when the score value is normalized), i.e.:
2.3.2 State Relation Matrix derivation
Time complexity O (n)
3 parameter Training
given a set of training samples (x, y), find the weight vector w, find the parameters, so that the following form:
Method: To find the stationary point of the logarithmic target function.
Target function:
where, not derivative, just a tick, J and different values, Y and, represent two different y values
Finally use gradient rise, learning parameters
and not independent of each other, but connected.
4, Graph-free model (UGM) Markov random Field/Markov network
A forward graph model, also known as a Bayesian network (Directed graphical Models, DGM, Bayesian Networks)
Probabilistic graph model/probability-free graph model
4.1 Items with Airport
From Bayesian networks to Markov random airports
Connect a child's public father directly, removing all the arrows
Not complete information is not lost (conventional method), conditional independent destruction
Properties of 4.2 MRF
1, pairs of Markov sex
2, Local Markov nature
3, Global Markov nature
The above three properties are equivalent
4.3 Regiment and the largest regiment
Definition: A sub-figure s in a graph G without a direction, if any two nodes in S have edges, then s is called G's Regiment (clique).
Largest Regiment: If C is a regiment of G, and can no longer join any of the nodes of G to make it known as a regiment, then C is called the largest regiment of G (Maximal clique).
The largest group {1,2,3},{2,3,4},{3,5} in the figure, the largest group is not related to the number,
As long as it is no longer possible to join any of the nodes of G to make it known as a regiment
4.4 Hammersley-clifford theorem
The joint distribution of UGM: The form of the product of a function of a random variable on the largest group;
This operation is called UGM factorization (factorization).
Linear chain conditional random field can be used for labeling and other problems
Crf
Summarize
The conditional random field can be expressed using a logarithmic linear model.
Not strictly speaking, the linear chain condition random field can be regarded as the generalization of the hidden Markov model, and the hidden Markov model can be regarded as the special case of the linear chain condition with the airport.
Disadvantages: Supervised learning calculation parameters, parameter learning speed is slow
July algorithm-December machine learning online Class-18th lesson notes-Conditional random airport CRF