- Introduction (Introduction)
- Generating patterns)
- Implicit patterns)
- Hidden Markov Model (Hidden Markov models)
- Forward Algorithm (Forward Algorithm)
- Viterbi Algorithm)
- Forward-backward algorithm (forward-backward algorithm)
- Summary
Viterbi Algorithm)
Find the hidden sequence with the highest possibility
We usually have a specific hmm, and then find the hidden sequence most likely to generate the observed Sequence Based on an observed sequence.
1. exhaustive search
We can see the relationship between each status and observation in.
By calculating the probability of all possible hidden sequences, we can find a hidden sequence with the highest possibility. The hidden sequence with the highest possibility maximizes PR (observed sequence | hidden state combination ). For example, for the dry damp soggy sequence in, the most likely hidden sequence is the largest of the following probabilities:
Pr (dry, damp, soggy | sunny, sunny, sunny), Pr (dry, damp, soggy | sunny, sunny, cloudy), Pr (dry, damp, soggy | sunny, sunny, rainy ),.... pr (dry, damp, soggy | rainy, rainy, rainy)
This method is feasible, but it is expensive. Like the forward algorithm, we can use time immutability of the transfer probability to reduce the computational complexity.
2. Use recursion to reduce complexity
Given a sequence of observation and hmm, we can consider recursion to find the most likely hidden sequence. We can first define a partial probability, which is the probability of reaching an intermediate state. Next we will discuss how to calculate the partial probabilities of T = 1 and T = n (n> 1.
Note that the partial probability here is different from the partial probability in the forward algorithm.Partial probability indicates the probability of a path that is most likely to reach a certain state at T moment, rather than the sum of all probabilities.
2a. partial probability and partial Optimal Path
Consider the following figure and the first-order transfer of the observed sequence (dry, damp, soggy)
Each intermediate or terminated state (t = 3) has the most possible path. For example, the three States at t = 3 have the following most possible paths:
We can call these pathsPartial Optimal Path. These partial optimal paths have a probability, that is, a partial probability. Unlike some probabilities in the forward algorithm, the probability here is only the probability of the most possible path, rather than the probability of all paths.
We can use (I, T) to represent the probability of the sequence with the highest probability in all possible sequences (paths) of the T moment to state I, some optimal paths are the paths that reach this maximum probability. For every moment, there is such a probability and some optimal paths.
Finally, we calculate the maximum probability and partial Optimal Path of each state at t = T, and select the State with the highest probability and its partial Optimal Path to obtain the global optimal path.
2b. Calculate the partial probability of T = 1 moment
When T = 1, the maximum possible path to a certain State does not exist, however, we can directly use the probability of a certain State at T = 1 and the transition probability from this state to the observed sequence K1:
2c. Calculate the partial probability of T> 1 moment
Next, we can calculate the partial probability of the T-1 Moment Based on the partial probability of the T-1 moment.
We can calculate the probability of all paths to State X and find the most possible path, that is, the local optimal path. Note that the path to X will inevitably pass through A, B, and C at T-1, so we can use the previous results. The most possible path to X is one of the following three:
(Sequence of States),..., A, x (sequence of States),..., B, XOR (sequence of States),..., C, X
What we need to do is find the path with the highest probability ending with ax, BX, and Cx.
According to the assumption of the first-order Markov model, a State prior to the sum of occurrence of a State has a relationship. Therefore, the probability of occurrence of X at the end of a sequence only depends on the previous state:
Pr (most probable path to a). Pr (X | A). Pr (observation | X)
With this formula, we can use t-1 results and State Transfer Matrix and confusion matrix data:
By extending the above expression, we can obtain the formula for calculating the maximum part probability of the I state in the T moment when the observed state is KT:
Aji indicates the probability of transition from status J to status I, and bikt indicates the probability that status I is observed as KT.
2d. Backward pointer
Considerations
There is a partial Optimal probability (I, T) in each intermediate and ending state ). But our goal is to find the most likely hidden state sequence, so we need a method to remember each node in some of the optimal paths.
Considering that we want to calculate the partial probability of T moment, we only need to know the partial probability of the T-1 moment, so we only need to record the state that leads to the maximum part probability of T moment, that is, at any time point, the system must be in a State where the maximum probability is generated at the next time. We can use a backward pointer to record the previous state that leads to the maximum probability of a State. The formal description is as follows:
Here argmax indicates that the j value of the formula can be maximized. It can also be found that the sum of the formula and the partial probability of the T-1 moment are related to the transfer probability, because the backward pointer is only used to find the "where I come from", this problem has nothing to do with observability, so we do not need to multiply the confusion factor here.
2e. Advantages
Using the Viterbi algorithm to decode an observed state has two important advantages:
- Recursion is used to reduce the complexity, which is the same as the previous algorithm.
- You can find the optimal hidden Sequence Based on the observed sequence. The formula is as follows:
Where
Here is a process of translating from left to right. The following result is obtained through the previous translation result. The starting point is the initial vector.
2. Supplement
However, when there is noise interference somewhere in the sequence, some methods may be far different from the correct answer.
However, the Viterbi algorithm checks the entire sequence to determine the most likely termination state, and then uses the backward pointer to locate the previous state, which is very useful for ignoring isolated noise.
3. Summary
The Viterbi algorithm provides an efficient method for calculating hidden Sequences Based on the observed sequence. It uses recursion to reduce computational complexity and uses all previous sequences for judgment, it can tolerate noise.
In the calculation process, this algorithm calculates the probability of each state at each time point, and uses a backward pointer to record the most likely last State to reach the current state. Finally, the most likely end state is to hide the last state of the sequence, and then use the backward pointer to find all the states of the whole sequence.
Forward-backward algorithm (forward-backward algorithm)
An interesting issue related to the hidden Markov model is to judge the practicability of a model (Forward Algorithm) and find a hidden sequence (Viterbi algorithm) hidden behind the observed sequence ). Of course, both processes need to know some information about hmm, such as the transfer matrix, confusion matrix, and initial π vector.
However, in many practical cases, hmm cannot be directly judged, which becomes a learning problem. forward and backward algorithms can evaluate HMM based on a series of observed sequences. A possible example is a large speech processing database. A speech sequence may be modeled as a Markov chain, and an observed sequence can be modeled as a recognizable state, however, you cannot directly obtain other related information.
Forward and backward algorithms are not difficult to understand, but they are more complex than forward algorithms and Viterbi algorithms. Therefore, we will not detail them here. In general, this algorithm first guesses some parameters, and then evaluates the value of these parameters to modify these parameters, reducing the error with the given training data, this is actually the idea of gradient descent in machine learning.
The name of the forward and backward algorithms is derived from each state. This algorithm calculates the probability of reaching the previous state, and also calculates the probability of generating the backward state of the terminated state, both probabilities can be achieved through recursive methods. The adjustment of HMM parameters can improve the accuracy of the intermediate probability, and these adjustments are the basis of algorithm iteration.
Summary)
Generally, a special pattern does not appear separately, but appears as a sequence in a certain period of time. There is an assumption for a process in time. The sum of the appearance of a state is related to the state of the previous n time units. This is an n-order Markov chain, the simplest case is the first-order Markov chain.
In many cases, real state sequences cannot be directly observed, but can be indirectly observed under a certain probability. This observed result is another observed sequence, in this way, we can define a hidden Markov model, which represents great value in some fields, especially speech recognition.
There are three problems related to the real sequence model:
- Evaluation: the probability that a given model can generate an observed sequence. This problem can be solved using a forward algorithm.
- Decoding: Given a model and an observed sequence, what is the most likely hidden sequence? This problem can be solved using the Viterbi algorithm.
- Learning: given an observed sequence, how can we know some parameters of this model? This problem can be solved by backward algorithms.
Hidden Markov Model has great value in analyzing real systems, but it also has some disadvantages, one of the biggest disadvantages is that the previous assumptions lead to too much simplification-a State only depends on the State between them, and this dependency is time-independent.
For more details about HMMs, see
L r Rabiner and B H Juang, 'An Introduction to hmms', IEEE ASSP magazine, 3, 4-16.