Http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/viterbi_algorithm/s1_pg1.html

**finding the most probable sequence of hidden states**Given a hmm, and an observation sequence, you might want the one with the greatest probability of the implicit sequence sequence of hidden states. 1. Exhaustive search for a solution can visualize the relationship between State and observed observations using the execution grid execution trellis:

The implicit sequence with the greatest probability can be found by listing all possible implicit sequences and calculating the following probabilities for each sequence.

For example, for an observation sequence in the execution grid, the possible implicit sequence probabilities are:

This method is very intuitive, but the amount of computation is very large. Similar to forward algorithm, we can also reduce computational complexity by taking advantage of the temporal invariance of probabilities time invariance of probabilities.

2. The reducint complexity using recursion considers recursive methods to identify the most likely implicit sequences. The first is defined (hereinafter referred to as Y), and y is the probability of reaching an intermediate state. The Partial probability y differs from the calculation in forward algorithm, because Y represents the probability of the most probable path to a state at time t, not all. 2a. Partial probabilities (Y) and partial best paths

For each of the above grids, each intermediate and final state (intermediate and terminating states) has one of the most likely paths to reach that state. For example: T=3, the most likely path for three states might be this:

These paths become the partial best paths. Each partial best paths has a probability, called the partial probability Y. Y (i,t) is the maximum probability of all the sequences in which the time t reaches the state I, and the partial best path is the sequence with the greatest probability. In particular, T = t, each state has a partial probability and a partial best path. By choosing the path with the largest partial probability, you can find the best path to the global overall. 2b. Calculating Y's at time t=1 when t = 1 o'clock, the most probable path to a state is not present, but the system is in that state when t=1 can be used, and the observed state is calculated in the probability of K1:

This is the same as the calculation method in forward algorithm.

2c. Calculating Y's at time t > 1 compute the most probable path to the state X, this path needs to be t-1 through the state A, B, or one of C. So the most probable path to X is one of them: we want to find the sequence with the maximum probability that ends with AX, BX, CX. Markov assumes that the probability of the current state depends only on the first n states, so for first order Markov assumption, the probability of X occurring in a sequence depends only on the previous state. So the probability of the most probable path that ends with AX is that the probability of the most probable path to X is: The first item is given by Y at t-1 time, the second is given by transition probabilities, and the third is by observation Probabili given by ties. By generalizing the above formula, you can get the partial best path to the state I at time t, and the probability of the observed state being KT is:

2d. Back pointers, F ' s

In each middle and end state (intermediate and end states), we can all get partial probabilities, Y (i,t). However, our goal is to find the most probable path in the case of a given observation sequence. So we need some sort of way of documenting the partial best paths.

In the process of calculating the partial probability, y in the value of time t, we only need to know the value of Y at t-1 moment. In the case where the partial probability is calculated, it is possible to record the Y (I,t) that is produced by which state. This method of recording is achieved by keeping a back pointer F for each state, and F points to the state that generated the current Y (i,t) at the previous time.

Note that this expression is obtained from the value of the previous state of Y and transition probabilities, and does not need to contain observation probability (and calculates y is not the same). This is because we want F to be able to solve the "best possible path to the state where I am." "This problem is related to the hidden states because the confusing factors produced by the observations can be neglected.

2e. Advantage of the approach uses VITERBI algorithm to decode the observed sequence, with two advantages: 1. The computational complexity decreased; 2. Viterbi algorithm has a very good character and can provide the best explanation based on the whole observation sequence. Another method that can be used to derive the execution sequence is:

which

This approach may appear to deviate from the correct answer.

However, Viterbi algorithm will consider the entire observation sequence and then backtrack backtracking through the F pointers (pointer) to find the most probable path. 3. Section Summaryviterbi algorithm provides a convenient method for analyzing hmm observation sequences to find the most probable sequence. The algorithm computes the Partial probability for each node and uses a Back-pointer F to point to how the node is reached. After the calculation is complete, you can find the entire path through the back pointer.

**Definition**Viterbi algorithm Definition1. Formal definition of algorithm for i = 1, ... N

The probability of initial implicit states (initial hidden state) and the observed probability observation probabilities to calculate the partial t=1 at probabilities.

For t=2,..., t,i=1,..., N,

Therefore, you can get the most probable path to the next state and record how the state is reached.

Records which state is most likely at the time of t=t.

For T=t-1, ..., 1,

You can backtrack to get the entire most probable path.

2. Calculating individual Y ' s and F ' s

The calculation method here is similar to the one in forward algorithm, except that the summation operation is used in forward algorithm, and the max operation is used here. This is because the forward algorithm is the total probability of the calculation reaching a state, and here the maximum probability is calculated.

The third of the HMM series: Viterbi algorithm