Reference: Introduction to Hidden Markov Model (HMM)
First, let's take a look at the specific status sequence. The state changes in this sequence are fixed, for example
The traffic light must be a green light> A red light> a yellow light.
Of course, there are some unknown status sequences, such
The weather is sunny today. You are not sure that tomorrow will be sunny or rainy.
So we use probability to represent this uncertainty, called a Markov process. The order of a Markov process indicates that the current State depends on the previous states, for simple consideration, the first-order Markov process is often used, that is, the current State only depends on the previous state.
A Markov process consists of a State set, an initial state, and a state transition matrix,
Take the weather as an example,
The status set is sunny, overcast, and rainy.
The initial status is sunny.
State Transition matrix, indicating the migration probability between each State
The Markov process model can already be used to observe various State sequences of the model. However, if you want to cut the surface and discover the nature of things, you need a hidden Markov model.(Hidden Markov Model)
Hidden Markov Model
Hidden Markov model is used to express the relationship between two State sequences, that is, the relationship between the essential state sequence and the Representative state sequence.
With the hidden Markov model, we can predict the representation state sequence based on the essential state sequence, or, in turn, explore the essence State Sequence Based on the representation state sequence.
Or the weather example,
For example, if a person is imprisoned in a house and cannot go out, he cannot directly observe the weather. But there are plants in the House, which will change the state of the weather.
What he can do is to guess the weather conditions based on the state of plants. This is an example of exploring the essence based on the appearance.
Where,
Hidden Status List, sun, cloud, and rain
List of observed statuses, soggy, damp, dryish, and dry
The transfer matrix of the implicit state is.
The relationship matrix between the implicit state and the observed state is B, indicating the probability of B being observed when the implicit state is.
Give a formal representation of the hidden Markov model,
Indicates the probability that the state changes to Qj at t + 1 when the State is Qi.
Indicates the probability that the observed state is VK when the State is Qj at t time.
Indicates the probability that the state is Qi at 1 moment.
The hidden Markov model can be expressed by the three elements above,
In addition, the hidden Markov model makes two assumptions,
Homogeneous Markov hypothesisThat is, the first-order Markov model assumes that the hidden state of the T moment only depends on the hidden state of the previous moment, and is irrelevant to the hidden state of other moments or the observed state of any moment.
Observation independence hypothesisThat is, the observed state at t time depends only on the hidden state at t time, and has nothing to do with the observed or hidden state at other times.
Three basic problems of Hidden Markov Model
1. Probability Calculation
Calculation, that is, the probability of the observed sequence in this model
2. Learning Problems
Maximum Likelihood is used to estimate parameters, maximizing the probability of occurrence of the observed sequence.
3. Prediction Problems
The model and observed state sequence are known, and the hidden state sequence that maximizes P (I | O) is solved,
Next, let's take a look at the solutions to each problem.
Probability Calculation Problems
The direct method is to find all hidden state sequences with a possible length of T and multiply them by the probability of the observed sequence under the hidden state sequence, as you can imagine, this complexity is,
Forward and backward Algorithms
Definition, forward Probability
That is, the probability that the observed sequence at t time is and the State at t time is
A forward algorithm is actually a recursive algorithm that uses a forward probability to calculate the current forward probability.
Look,
If the case is left blank, the 3 power is calculated 27 times, because for this case, the possibility of a hidden state sequence with a length of 3 is 27.
If the forward algorithm is used, it needs to calculate 3x3 + 3x3 = 18 times, because it avoids repeated computation.
We can see that the computing complexity is reduced from
The core idea of the forward algorithm is to avoid repeated computation. It is actually a dynamic planning algorithm.
The idea of backward algorithms is the same as that of forward algorithms.
Defines the backward probability. When the T-moment state is, the probability is the backward probability.
Backward algorithm,
Initialization first. For the T-moment status, there is no subsequent status, so the default backward probability is 1.
Then, a recursive formula is provided to reverse calculate the backward probability of the previous layer. When the first layer is obtained, the backward probability sum of all States is obtained.
We can see that the forward or backward algorithms can be used for Recursive final results.
Learning Algorithms
Supervised Learning Algorithm
In a training set, there are s observation sequences of the same length and corresponding hidden state sequences.
Then we directly use the maximum likelihood to fit the parameters, that is, direct statistics.
, Where
That is, the number of times the I state is transferred to J State divided by the total number of times the I state is transferred
, Where
This method is simple, but the problem is that the training set is hard to obtain.
Therefore, we use unsupervised algorithms.
Unsupervised algorithm, Baum-Welch Algorithm
A sequence of observation States with only s length of T,
Parameters that need to fit hidden Markov models,
This is the probability model with hidden variables,
The typical algorithm used to solve this problem is the EM algorithm. The specific algorithm is not listed here. You can take a closer look at it later.
Prediction Problems
Approximation Algorithm
Find the hidden state sequence that is most likely to appear at every moment t to obtain
Let's first look at how to calculate,
With forward and backward probability definitions, we can see
Therefore, you only need to obtain,
This algorithm is easy to compute, but cannot guarantee that the predicted state sequence is the most likely state sequence as a whole.
Viterbi Algorithm
In fact, it is to use the dynamic planning algorithm to solve the optimal path.
Prediction is an important and common issue of hidden Markov models, because hidden Markov models are often used to mine hidden states in the same observed state.
For example, we used the state of plants in the House to speculate on the weather.
Or in natural language processing, we can infer the part of speech of each word.
The key to dynamic planning algorithms is to write recursion,
If I know the optimal path to 3 States A, B, C at the T-1 moment
If you want to find the optimal path to the x node, it is very easy to find the optimal path from the three paths a + ax, B + bx, C + Cx.
Formal Representation,
Definition,
The probability of the Optimal Path among all the paths that reach status I at t time; the optimal path is the path with the highest probability
So we get the recursive formula,
T moment, the optimal path probability value for each State × probability of migrating to I state × I state obtains the probability of OT + 1 of the observed value, and calculates its maximum value.
Obtain the probability of the optimal path at t + 1.
When the optimal path of all state nodes is found at the t time, the highest probability is the global optimal path.
But at this time, we only know that the final state of the Optimal Path is I, and the path in the process needs to be traced back,
Therefore, when solving the problem, we need to buffer the last node of the Optimal Path of each state at each time point, that is, the State from the previous time point.
Compared with the formula, B is missing because B is the same for the same I and B, so it is omitted.
Finally, the complete Viterbi Algorithm
Statistical Learning Method note-Hidden Markov Model