Last year, I made a hmm word divider, and sorted it out as required by @ jnduan.
This article has more formal symbols, because I hope to understand from the perspective of the model.AlgorithmBy default, the readers have learned some background knowledge about the mathematical ideas in details. (We recommend the word segmentation chapter in the beauty series of Dr. Wu Jun as a pre-course for this article)
1. Markov process-random variables that are not independent from each other depend on the previous random variable sequence
N finite states S = {S1, S2 ,... sn}, random variable sequence Q (Q1, q2 ,..., in QT), the value of any variable is a certain State in set S. The probability that the system is in t state SJ is:
(1)
If the state of the system at time t is only related to the State at time t-1, formula (1) can obtain formula (2) to obtain a discreteLevel 1 Markov Chain
(2)
AIJ is in Si to SJ stateState Transfer Probability, Where:
The number of State transfer probabilities depends on the number of State sets and the order of the Markov process. For example, a first-order process with n States has n square state transfer
2. Hidden Markov Model
In the Markov model, t State S representsObservabilityEvent Q, the value of event Q depends on the previous event qt-1 that occurred
In the Hidden Markov Model, the random process of event O is not observed.Hidden state (s)ToObservation symbol (V)OfRandom Function BJ (K ),Time tObserved event (Ot = V)The valueHide events (Qt = s)AndHide event QTThe valueHide event qt-1The decision is equivalent to a double Markov process.
In a Markov model, the model μ = (s, A, π), where:
S = Status set
A = state transfer probability matrix
π = initial state probability distribution
In the hidden Markov model, the model μ = (S, V, A, B, π), where:
S = hidden state set. The set size is N.
V = a collection of observed symbols. The set size is M.
A = state transfer probability matrix, which is the transition probability matrix from the State Si to SJ. The size is N * n.
B = observed probability matrix, which hides the probability matrix from the State SJ to the observed symbol VK, And the size is N * m.
π = initial state probability distribution, with the size of N
Q and O are the output sequence of the hidden state set S and the observation symbol V about the time series T, respectively.
State transfer probability matrix
Observed probability matrix
Any model implies that the data is subject to a series of assumptions of the model. Hidden Markov Model has two basic assumptions.
Hypothesis 1: homogeneous Markov hypothesis, that is, the State (QT) of hidden Markov chains at any moment t is dependent on the state (qt-1) of the previous moment ), it has nothing to do with the hidden state and observation of other moments, but also with the time t, abbreviated
Hypothesis 2: Observation independence hypothesis, that is, if the observation (OT) at any time only depends on the hidden state (QT) of the Markov chain at that time, it is irrelevant to other observations and hidden states, abbreviated
3. Hidden Markov Problems
The Hidden Markov Model is basically used to solve three basic problems. In the word segmentation field, the process from model learning to word segmentation result prediction is mainly used:
A,Calculate ProbabilityThat is, the given Model μ = (a, B, π) and the observation sequence O (O1, O2 ,..., ot), calculate the likelihood of O occurrence in the observed sequence under Model μ, and calculate P (o | μ)
B,Learning ModelThe observed sequence O (O1, O2 ,..., ot), estimating the parameters of the model μ = (a, B, π) maximizes the likelihood of O under the Model μ, that is, the maximum likelihood method for P (μ | O)
C,Prediction ResultKnown as the decoding problem. We know the Model μ = (a, B, π) and the observed sequence O (O1, O2 ,..., ot), calculate the maximum probability P (q | o, μ) of the given observed sequence's hidden state sequence Q
3.1. Probability Calculation Problems
Direct Calculation Method
The probability of the Status sequence Q (Q1, q2,..., qN) is
The probability of O in the observed sequence is
The joint probability of O and I is
Expand
Forward Algorithm
Defines the hidden state sequence Q (Q1, q2 ,..., qt) and observation sequence O (O1, O2 ,..., ot), event QT value is expressed as any value Si, qt-1 is SJ, ot is observed as VK, then the transition probability from Si to SJ is expressed as AIJ, the observed probability from SJ to VK is BJ (k). The forward probability and the probability P (o | μ) of the observed sequence are obtained by recursive method)
Initial Value
Recurrence
Please
3.2 learning problems
Supervised Learning Method
Assuming that the known training data contains t state sequences with the same length and corresponding observation sequences, we use the maximum likelihood method to estimate the Model μ, and evaluate μ = {A, B, π}
Estimated Transfer Probability AIJ is
Estimation of the observed probability BJ (k) is
It is estimated that the initial state π is the frequency of the initial state Q in S samples.
Unsupervised learning method (EM algorithm omitted)
3.3 prediction Problems
Viterbi Algorithm
The Viterbi algorithm uses the idea of dynamic planning to solve the given observation sequence O (O1, O2 ,..., Ot) optimal hidden path in Model μ Q (Q1, q2 ,..., Qt), that is, at the moment t to determine the optimal path of the moment T-1, to obtain the final moment t optimal path end I *, backtracking
1,Initialization
Describe the probability of each candidate's hidden state at the moment T = 1 delta_1 (I)
Initialize the Optimal Path node variable
2,Recursion, t = 2, 3,..., T
Probability of candidate hidden state at moment t delta_t (I) equals to any hidden node T-1 at previous moment delta_t-1 (j) multiply the state transition probability Aji corresponding to the state, find the maximum solution MAX [], and then multiply the observed probability Bik to obtain the full solution.
Finding the node J with the greatest solution in the moment t-1 as the optimal path of the moment t-1
3,Termination
Obtain the maximum probability of the final moment t and the final node of the optimal path I *
(To be continued)