The best example of hmm learning five: forward algorithm _ forward algorithm

Source: Internet
Author: User

1. Exhaustive searches (exhaustive search for solution)
Given the hidden Markov model, where the model parameters (pi, A, B) are known, we want to find the probability of observing the sequence. Or consider the weather example, we have a hidden Markov model (HMM) that describes the weather and the humidity state of the algae that is closely related to it, and we also have a moisture state observation sequence for algae. Suppose that the 3-day continuous observation of seaweed humidity is (dry, wet and wet--and each of these three days can be sunny, cloudy or rainy, and for the observation sequence and the hidden state, consider it a grid: Each column in the

grid shows the possible weather State, And each state in each column is connected to each state in the adjacent column. And the transfer between the States is a probability by the state transition matrix. Below each column is the state of observation at a point in time, and the probability of the observed state given by any hidden state is provided by the obfuscation matrix.
As you can see, one way to calculate the probability of observing a sequence is to find each possible hidden state and add the probability of the observed sequence in these hidden states. For the above (weather) example, there will be 3^3 = 27 different weather sequence possibilities, so the probability of observing the sequence is:
Pr (Dry,damp,soggy | HMM) = PR (Dry,damp,soggy | sunny,sunny,sunny) + PR (dry,damp,soggy | sunny,sunny, Cloudy) + PR (dry,damp,soggy | sunny,sunny , rainy) + .... Pr (Dry,damp,soggy | rainy,rainy, Rainy)
It is extremely expensive to compute the observation sequence in this way, especially for large models or longer sequences, so we can use the time invariance of these probabilities to reduce the complexity of the problem.

2. Using recursion to reduce the complexity of the problem
Given a hidden Markov model (HMM), we will consider the probability of recursively calculating an observation sequence. We first define the local probability (partial probability), which is the probability of reaching an intermediate state in the grid. We will then describe how these local probabilities are computed at t=1 and T=n (>1).
Suppose a T-long observation sequence is:
     
  
2a. Local probability (' s)
Consider the following grid, which shows the state of the weather and the first-order state transitions for drying, wetting and soaking of the observed sequence:
   
We can calculate the probability of reaching an intermediate state in the grid as the sum of probabilities for all possible paths to this state.
For example, the local probability of t=2 in a "cloudy" state is computed by the following path:
   
We define the local probability of T moment in State J as at (j)--this local probability is calculated as follows:
T (j) = PR (Observation state | Hidden state j) x PR (T-time all point to J-State path)
For the last observed state, the local probability includes the probability of reaching these states through all possible paths-for example, for the grid above, the final local probability is computed by the following path:
   
It can be seen that these final local probability sums are equivalent to the sum of all possible path probabilities in the grid, and the observed sequence probability of a given hidden Markov model (HMM) is obtained.

2b. Local probability of calculating t=1 ' s
We calculate the local probability according to the following formula:
T (j) = PR (Observation state | Hidden state j) x PR (T-time all point to J-State path)
Especially when t=1, there is no path to the current state. Therefore, the probability of t=1 in the current state is the initial probability, that is, the PR (state|t=1) =p, thus, the local probability of t=1 is equal to the initial probability of the current state multiplied by the relevant observation probability:
         
So the local probability of the initial time state J depends on the initial probability of this state and the observed probability we see at the corresponding moment.

2c. Local probability of calculating t>1 ' s
We revisit the formula for calculating local probabilities as follows:
T (j) = PR (Observation state | Hidden state j) x PR (T-time all point to J-State path)
We can assume that (recursively), multiplication left item "PR (Observation state | Hidden State J)" has already been, now consider its right item "PR (all points to J State path in T time)".
To calculate the probability of all paths to a state, we can calculate the probability of each path to this state and sum them up, for example:
      
The number of paths required for the calculation increases with the observation sequence, but the t-1 moment ' s gives all the previous path probabilities to this state, so we can define the T-moment's ' s by the local probability of the t-1 moment, which means:
     
So the probability we calculate is equal to the corresponding observation probability (i.e., the sum of the probabilities of the symbols observed by state J at the time of the t+1 and the total probability of the moment arriving in this state-this comes from the product of the calculated result of each local probability in the previous step and the corresponding state transition probability product.
Note that we already have an expression that calculates the local probability of a t+1 moment using only the T-time local probability.
Now we can recursively compute the probability of a observed sequence after a given hidden Markov model (HMM)-that is, to compute the t=2 moment's ' s by the local probability ' s of the t=1 moment, and through the t=2 moment's ' s to compute the t=3 moment's ' s et cetera until t=t. The probability of the observed sequence of the given hidden Markov model (hmm) equals the sum of the local probabilities of the t=t moment.

2d. Reduce computational complexity
We can compare the time complexity of observing sequence probabilities by exhaustive search (evaluation) and by recursive forward algorithm.
We have an observation sequence o with a length of T and a hidden Markov model l= (PI,A,B) that contains n hidden states.
Exhaustive search will include the calculation of all possible sequences:
   
Formula
    
We observe the probability summation--pay attention to its complexity and T-exponentially number-level relationship. Conversely, using the forward algorithm we can use the information computed in the previous step, and correspondingly, its time complexity is linearly related to T.
Note: The time complexity of the exhaustive search is 2tn^t, and the time complexity of the forward algorithm is n^2t, where T refers to the observed sequence length, and n is the number of hidden states.

3. Summary
Our goal is to compute the probability--PR (observations |) of the observed sequence under hmm for a given hidden Markov model.
We first calculate the complexity of the whole probability by calculating the local probability (' s), and the local probability indicates the probability of t arriving at a certain state s at all times.
In T=1, local probabilities can be computed using the initial probability (from the P vector) and the observed probability PR (observation|state) (from the confusion matrix), while the local probability of the t>1 can be computed by the local probability of T-time.
Therefore, the problem is recursively defined, and the probability of observing the sequence is calculated by calculating the local probability of the t=1,2,..., T, and the sum of all local probabilities ' s when t=t.
Note that the time complexity of calculating the observed sequence probabilities in this way is much less than the time complexity of calculating the probabilities of all the sequences and adding them (exhaustive search).

Forward algorithm definition (Forward algorithm definition)

We use the forward algorithm to compute the probability of a T-long observation sequence:
     
Each of y is one of the observation sets. The local (intermediate) probability (' s) is computed recursively, first by calculating the local probability of all states in the t=1 moment:
     
Then at each point in time, t=2, ..., T, for the local probability of each state, the local probability is computed by the following formula:
     
That is, the corresponding observation probability of the current state and the product of all the path probabilities arriving in the state, recursively utilizes some of the values that have already been computed in the last point in time.
Finally, given a hmm, the probability of observing the sequence equals the sum of all local probabilities at T-time:
     
Again, each local probability (T > 2 o'clock) is calculated from the result of the previous moment.
For the "weather" example, the following chart shows the computational process of the local probability of T = 2 for a cloudy state. This is the result of a corresponding observation of the sum of probability B and the summation of the local probability and the state transition probability of the previous moment:
   
(Note: This diagram and the Viterbi algorithm 4, the similarity of the problem, please see the text after the comments, thank the reader Yaseenta)

Summary (Summary)

We use the forward algorithm to compute the probability of a observed sequence after a given hidden Markov model (HMM). The recursive method is used to avoid the exhaustive calculation of all the paths in the grid.
Given this algorithm, it can be directly used to determine which hmm is best described in some hidden Markov models (HMMs) for a known sequence of observations--first use the forward algorithm to evaluate each (HMM), and then select one of the highest probabilities.

The first thing to note is that this section is not a translation of this series, but as a supplement to the chapter on the forward algorithm, hoping to illustrate the forward algorithm from a practical perspective. In addition to using the program to read Hmm's forward algorithm, but also hope that the original example of the problem to come out and discuss.
The program presented in this paper comes from the umdhmm of the C-language version of the HMM Toolkit, specifically in the "Hmm version of several different programming languages". To explain the basic situation of UMDHMM this package, in the Linux environment, into the umdhmm-v1.02 directory, "make all" will produce 4 executable files, respectively:
GENSEQ: Generates a symbolic sequence using a given hidden Markov model (generates a symbol sequence using the specified model sequence using the specified model)
Testfor: Using forward algorithm to compute log Prob (Observation sequence | Hmm model) (computes log Prob (Observation|model) using the Forward algorithm.)
Testvit: For a given sequence of observed symbols and Hmm, the Viterbi algorithm is used to generate the most likely hidden state sequence (generates the most like states sequence for a given symbol sequence, give n the HMM, using Viterbi)
ESTHMM: For a given sequence of observed symbols, the Baumwelch algorithm is used to learn hmm (estimates the hmm from a given symbol sequence using Baumwelch).
These executables need to be read in a fixed-format hmm file and an observation symbol sequence file, and the format requirements and examples are as follows:
HMM file Format:
——————————————————————–
m= Number of symbols
n= Number of States
A:
A11 A12 ... a1n
A21 A22 ... a2n
. . . .
. . . .
. . . .
AN1 aN2 ... ANN
B:
B11 B12 ... b1m
B21 B22 ... b2m
. . . .
. . . .
. . . .
BN1 bN2 ... bnm
Pi:
Pi1 pi2 ... PiN
——————————————————————–

Hmm file Example:
——————————————————————–
M= 2
N= 3
A:
0.333 0.333 0.333
0.333 0.333 0.333
0.333 0.333 0.333
B:
0.5 0.5
0.75 0.25
0.25 0.75
Pi:
0.333 0.333 0.333
——————————————————————–

To observe the sequence file format:
——————————————————————–
T=seqence length
O1 O2 O3 ... OT
——————————————————————–

Example of an observation sequence file:
——————————————————————–
t= 10
1 1 1 1 2 1 2 2 2 2
——————————————————————–

For the forward algorithm test program testfor, run:
Testfor model.hmm (hmm file) obs.seq (observation sequence file)
We can get the numerical value of the probability result of the observed sequence, where we add a line of output to the output of the logarithmic result in the 58th row of testfor.c:
fprintf (stdout, "prob" (o| Model) =%f\n ", Proba);
We can output the probability value obtained by using the forward algorithm to calculate the observed sequence. At this point, all the preparations have been completed, and next, we will go into the specific process of interpretation.
First, you need to define the data structure of Hmm, which is the five basic elements of Hmm, as defined in UMDHMM (in hmm.h):

typedef struct
{
int N; /* Hidden status number; q={1,2,..., N} * *
int M; * * Number of observed symbols; v={1,2,..., m}*/
Double **a; /* state transition matrix A[1..N][1..N]. A[I][J] is the transfer probability of State J from T moment state I to t+1 time
Double **b; /* obfuscation matrix B[1..N][1..M]. B[J][K] The probability that the k is observed in State J. */
Double *pi; /* Initial vector pi[1..n],pi[i] is the initial state probability distribution * *
} HMM;

Examples of forward algorithm programs are as follows (in FORWARD.C):
/*
Function parameter Description:
*PHMM: The known hmm model; T: Observe the length of the symbolic sequence;
*o: Observation sequence; **alpha: local probability; *pprob: the Ultimate observation probability
*/
void Forward (HMM *phmm, int T, int *o, double **alpha, double *pprob)
{
int I, J; /* Status Index * *
int t; /* Time Index * *
Double sum; * * To find the local probability of the middle value * *

/* 1. Initialization: Calculates the local probability of all states at the t=1 moment: * *
for (i = 1; I <= phmm->n; i++)
Alpha[1][i] = phmm->pi[i]* phmm->b[i][o[1]];
  
/* 2. Induction: Recursive calculation of each point in time, t=2, ..., t-time local probability * *
for (t = 1; t < T; t++)
{
for (j = 1; J <= phmm->n; J + +)
{
sum = 0.0;
for (i = 1; I <= phmm->n; i++)
Sum + + alpha[t][i]* (phmm->a[i][j]);
ALPHA[T+1][J] = sum* (phmm->b[j][o[t+1]);
}
}

/* 3. Termination: The probability of observing a sequence equals the sum of all local probabilities of T-time
*pprob = 0.0;
for (i = 1; I <= phmm->n; i++)
*pprob + = Alpha[t][i];
}

I will use this procedure in the next section to verify the problem with the forward algorithm demo example in the original English text.

In the text of the HMM translation series, the author cites an interactive example of a forward algorithm, which is also a good place in this series, but when it comes to running this example, it seems to have a problem.
First of all, how to use this interactive example, the runtime needs to support Java browser, I use Firefox. First, look at the sequence in the dialog box in front of the set button, such as "dry,damp, soggy" or "Dry damp soggy", and observe the symbols separated by commas or spaces, then click the Set button, which initializes the observation matrix; If you want a total result, PR ( Watch sequence | Hidden Markov model), just click the Run button next to the point, and if you want to observe the calculation process one step at a while, that is, the local probability of each node.
The known hidden Markov models defined in the original interactive example (the weather example) are as follows:
1, Hidden State (weather): sunny,cloudy,rainy;
2, observe the state (seaweed humidity): dry,dryish,damp,soggy;
3. Initial state probability: Sunny (0.63), cloudy (0.17), rainy (0.20);
4. State Transfer matrix:

Weather today
Sunny Cloudy Rainy
Weather Sunny 0.500 0.375 0.125
Yesterday Cloudy 0.250 0.125 0.625
Rainy 0.250 0.375 0.375

5, Confusion matrix:

Observed states
Dry Dryish Damp soggy
Sunny 0.60 0.20 0.15 0.05
Hidden Cloudy 0.25 0.25 0.25 0.25
States rainy 0.05 0.10 0.35 0.50

For UMDHMM to run this example, we convert the hidden Markov model in the above weather example into the following umdhmm hmm file weather.hmm:
——————————————————————–
M= 4
N= 3
A:
0.500 0.375 0.125
0.250 0.125 0.625
0.250 0.375 0.375
B:
0.60 0.20 0.15 0.05
0.25 0.25 0.25 0.25
0.05 0.10 0.35 0.50
Pi:
0.63 0.17 0.20
——————————————————————–
Before running the example, if the reader wants to observe the result of the operation of each step, the void Forward (...) in the forward.c of the umdhmm-v1.02 directory can be The functions are replaced by the following:
——————————————————————–
void Forward (HMM *phmm, int T, int *o, double **alpha, double *pprob)
{
int I, J; /* State Indices * *
int t; /* Time Index */
Double sum; /* Partial sum * *
  
/* 1. Initialization *
for (i = 1; I <= phmm->n; i++)
{
Alpha[1][i] = phmm->pi[i]* phmm->b[i][o[1]];
printf ("a[1][%d] = pi[%d] * b[%d][%d] =%f *%f =%f\\n", I, I, I, O[i], phmm->pi[i], phmm->b[i][o[1], Alpha[1][i] );
}
  
/* 2. Induction *
for (t = 1; t < T; t++)
{
for (j = 1; J <= phmm->n; J + +)
{
sum = 0.0;
for (i = 1; I <= phmm->n; i++)
{
Sum + + alpha[t][i]* (phmm->a[i][j]);
printf ("a[%d][%d] * a[%d][%d] =%f *%f =%f\\n", t, I, I, J, Alpha[t][i], phmm->a[i][j], alpha[t][i]* (phmm->a[i][ j]));
printf ("sum =%f\\n", sum);
}
ALPHA[T+1][J] = sum* (phmm->b[j][o[t+1]);
printf ("a[%d][%d] = sum * b[%d][%d]] =%f *%f =%f\\n", T+1, J, J, O[t+1], sum, phmm->b[j][o[t+1], alpha[t+1][j]);
}
}

/* 3. termination *
*pprob = 0.0;
for (i = 1; I <= phmm->n; i++)
{
*pprob + = Alpha[t][i];
printf ("alpha[%d][%d] =%f\\n", T, I, alpha[t][i]);
printf ("Pprob =%f\\n", *pprob);
}
}
——————————————————————–
After the replacement is complete, re "make clean", "made all", so that the new Testfor executable program can output the results of each step of the forward algorithm.
Now we use Testfor to run the default given observation sequence "Dry,damp,soggy" in the original text, which corresponds to the umdhmm readable sequence file test1.seq:
——————————————————————–
T=3
1 3 4
——————————————————————–
All right, all ready for work, now enter the following command:
Testfor weather.hmm test1.seq > RESULT1
RESULT1 contains all the result details:
——————————————————————–
Forward without scaling
A[1][1] = pi[1] * B[1][1] = 0.630000 * 0.600000 = 0.378000
A[1][2] = pi[2] * B[2][3] = 0.170000 * 0.250000 = 0.042500
A[1][3] = pi[3] * B[3][4] = 0.200000 * 0.050000 = 0.010000
...
Pprob = 0.026901
Log prob (o| model) = -3.615577E+00
Prob (o| model) = 0.026901
...
——————————————————————–
The bold part is the final observation sequence of probability results, that is, in this case the PR (Observation sequence | HMM) = 0.026901.
However, in the original midpoint of the Run button, the result is: probability of this model = 0.027386915.
The difference between the two is exactly where. Let's take a closer look at the middle run process:
The calculation of local probability in the initialization and t=1 moment is consistent, no problem. However, when t=2, the local probability of "Sunny" in the hidden state is inconsistent. The results of the examples given in the original English text are:
Alpha = (((0.37800002*0.5) + (0.0425*0.375) + (0.010000001*0.125)) * 0.15) = 0.03092813
The results given by UMDHMM are:
——————————————————————–
A[1][1] * A[1][1] = 0.378000 * 0.500000 = 0.189000
sum = 0.189000
A[1][2] * A[2][1] = 0.042500 * 0.250000 = 0.010625
sum = 0.199625
A[1][3] * A[3][1] = 0.010000 * 0.250000 = 0.002500
sum = 0.202125
A[2][1] = sum * b[1][3]] = 0.202125 * 0.150000 = 0.030319
——————————————————————–
The difference lies in the choice of state transition probability, the original text selects the first row in the state transition matrix, and UMDHMM selects the first column in the state transition matrix. If we look at the state transfer matrix given in the original text, the first line represents the probability of the state "Sunny" from the previous moment to the state "Sunny", "Cloudy", "rainy" of the current moment, and the first column represents the state of the previous moment "Sunny", "Cloudy", The probability of "rainy" to the state "Sunny" at the current moment respectively. So it seems that the original calculation process is wrong, the reader may wish to try a few more examples to see, the forward algorithm this chapter is over.

Please refer to the original translation series: Http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

Note: Original article, reprint please indicate the source "I love Natural Language processing": www.52nlp.cn

This article link address: http://www.52nlp.cn/hmm-learn-best-practices-five-forward-algorithm-5

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.