Hmm (second) forward backward algorithm of hidden Markov model to evaluate the probability of observation sequence

Source: Internet
Author: User

Hmm (a) hmm model of hidden Markov model

Hmm (second) forward backward algorithm of hidden Markov model to evaluate the probability of observation sequence

Hmm (three) Baum-Welch algorithm for hmm parameter (TODO) in hidden Markov model

Hmm (four) Viterbi algorithm decoding hidden State sequence (TODO) by Hidden Markov model

In hmm (a) hmm model of hidden Markov model, we talk about the basic knowledge of HMM and the three basic problems of Hmm, we focus on the solution to the first basic problem of hmm, that is, the known model and observation sequence, and the probability of the occurrence of the observed sequence.

1. Review of the HMM question one: probability of finding an observation sequence

First, we review the problem of HMM model. This is the problem. We know that the parameter of the HMM model $\LAMBDA = (A, B, \pi) $. The $a$ is the matrix of the hidden state transition probability, $B $ is the matrix of the observed state generation probability, and $\pi$ is the initial probability distribution of the hidden state. At the same time we have obtained the observation sequence $o =\{o_1,o_2,... o_t\}$, now we require the observation sequence $o$ the conditional probability $\lambda$ ($p) $ in the model O|\LAMBDA.

At first glance, the problem is simple. Because we know all the transition probabilities between hidden states and all the probabilities of generating from the hidden state to the observed state, then we can solve the violence.

We can enumerate all possible hidden sequences of length $t$ $i = \{i_1,i_2,..., i_t\}$, distribution to find the joint probability distribution $o =\{o_1,o_2,... o_t\}$ of these hidden sequences and observed sequences $p (O,I|\LAMBDA) $, So that we can easily find out the Edge distribution $p (O|\LAMBDA) $.

The specific method of solving the violence is this: first, any hidden sequence $i = \{i_1,i_2,..., i_t\}$ The probability of appearing is: $ $P (I|\LAMBDA) = \pi_{i_1} a_{i_1i_2} a_{i_2i_3} ... a_{i_{ t-1}\;\;i_t}$$

For a fixed state sequence $i = \{i_1,i_2,..., i_t\}$, the probability of the observed sequence $o =\{o_1,o_2,... o_t\}$ that we require is: $ $P (o| I, \lambda) = b_{i_1} (o_1) b_{i_2} (o_2) ... b_{i_t} (o_t) $$

Then the probability that $o$ and $i$ will appear together is: $ $P (O,I|\LAMBDA) = P (I|\LAMBDA) p (o| I, \lambda) = \pi_{i_1}b_{i_1} (o_1) a_{i_1i_2}b_{i_2} (o_2) ... a_{i_{t-1}\;\;i_t}b_{i_t} (o_t) $$

Then, the probability distribution of the edge is obtained, and the conditional probability of the observed sequence $o$ under the model $\lambda$ $p (O|\LAMBDA) $:$ $P (O|\LAMBDA) = \sum\limits_{i}p (O,I|\LAMBDA) = \sum\ Limits_{i_1,i_2,... I_t}\pi_{i_1}b_{i_1} (O_1) a_{i_1i_2}b_{i_2} (o_2) ... a_{i_{t-1}\;\;i_t}b_{i_t} (O_T) $$

Although the above method is effective, but if our number of hidden states $n$ very much that is troublesome, at this time we predict the state has $n^t$ kind of combination, the time complexity of the algorithm is $o (TN^T) $ order. Therefore, for some models with few hidden states, we can use brute force solution to get the probability of the observed sequence, but if there are many hidden states, the above algorithm is too time-consuming, we need to find other concise algorithms.

The forward backward algorithm is a way to help us solve this problem in a low time complexity situation.

2. Probability of hmm observation sequence using forward algorithm

The forward backward algorithm is the general term of forward algorithm and back algorithm, both of which can be used to find the probability of hmm observation sequence. Let's start by looking at how the forward algorithm solves this problem.

The forward algorithm is essentially a dynamic programming algorithm, that is, we need to find the local state recursion formula, so that the optimal solution of the sub-problem is extended to the optimal solution of the whole problem.

In the forward algorithm, the local state of dynamic programming is defined by defining "forward probability". What is the forward probability, in fact, the definition is simple: To define the moment $t$ the hidden state is $q_i$, the observed state of the sequence is $o_1,o_2,... o_t$ probability is the forward probability. Recorded as: $$\alpha_t (i) = P (o_1,o_2,... o_t, i_t =q_i | \lambda) $$

Since it's a dynamic plan, we're going to have to push, now we're assuming we've found the forward probabilities of each hidden state at the moment of $t$, and now we need to pass forward probabilities of each hidden state at the time of $t+1$.

As can be seen, we can be based on the time $t$ of each hidden state of the forward probability, and then multiplied by the corresponding state transition probability, namely $\alpha_t (j) a_{ji}$ is at the moment $t$ observed $o_1,o_2,... o_t$, and always $t$ hidden state $q_j$, time $t +1$ The probability of hiding state $q_i$. If you want to sum the probabilities of all the lines below, that is, $\sum\limits_{j=1}^n\alpha_t (j) a_{ji}$ is the probability of $t$ observing $o_1,o_2,... o_t$ at all times and $t+1$ hidden state $q_i$ at all times. Further, since the observing state $o_{t+1}$ only depends on the $t+1$ time hidden state $q_i$, so that $[\sum\limits_{j=1}^n\alpha_t (j) a_{ji}]b_i (O_{t+1}) $ is at the moment $t+1$ observed to $ O_1,o_2,... o_t,o_{t+1}$, and the probability of $t+1$ hidden state $q_i$ at all times. And this probability, is precisely the moment $t+1$ corresponding to the hidden state of the $i$ forward probability, so we get forward probability of the recursive relationship is as follows: $$\alpha_{t+1} (i) = \big[\sum\limits_{j=1}^n\alpha_t (j) a_{ Ji}\big]b_i (o_{t+1}) $$

Our dynamic planning begins at the moment 1, to the moment $t$ ends, since $\alpha_t (i) $ indicates that at the moment the $t$ observation sequence is $o_1,o_2,... o_t$, and the probability of $t$ hidden state $q_i$ at all times, we simply add the probability of all hidden states corresponding to that $\ sum\limits_{i=1}^n\alpha_t (i) $ was given the probability that at the moment the $t$ observation sequence was $o_1,o_2,... o_t$.

The following summarizes the forward algorithm.

Input: Hmm model $\LAMBDA = (A, B, \PI) $, observation sequence $o= (o_1,o_2,... o_t) $

Output: Observed sequence probability $p (O|\LAMBDA) $

1) Calculate the time 1 of each hidden state forward probability: $$\alpha_1 (i) = \pi_ib_i (o_1), \; i=1,2,... n$$

2) Recursive moment $2,3,... Forward probability of t$ moment: $$\alpha_{t+1} (i) = \big[\sum\limits_{j=1}^n\alpha_t (j) a_{ji}\big]b_i (o_{t+1}), \; i=1,2,... n$$

3) Calculate the final result: $ $P (O|\LAMBDA) = \sum\limits_{i=1}^n\alpha_t (i) $$

From the recursive formula can be seen, our algorithm time complexity is $o (tn^2) $, than the time complexity of the brute force Solution $o (TN^T) $ less than a few orders of magnitude.

3. Hmm forward Algorithm solution example

In this paper, we use hidden Markov model hmm (a) Hmm model box and ball examples to show the calculation of forward probability.

Our collection of observations is: $ $V =\{red, white \},m=2$$

Our state collection is: $ $Q =\{box 1, Box 2, Box 3\}, N=3 $$

The length of the observation sequence and the state sequence is 3.

The initial state distribution is: $$\pi = (0.2,0.4,0.4) ^t$$

The state transition probability distribution matrix is:

$ $A = \left (\begin{array} {CCC} 0.5 & 0.2 & 0.3 \ 0.3 & 0.5 & 0.2 \ 0.2 & 0.3 &0.5 \end{array} \right) $$

The observed state probability matrix is:

$ $B = \left (\begin{array} {CCC} 0.5 & 0.5 \ 0.4 & 0.6 \ 0.7 & 0.3 \end{array} \right) $$

Observation sequence of the color of the ball: $ $O =\{red, white, red \}$$

Follow the forward algorithm in our previous section. First calculate the forward probability of 13 states at a time:

Moment 1 is the red ball, the probability of the hidden state being Box 1 is: $$\alpha_1 (1) = \pi_1b_1 (o_1) = 0.2 \times 0.5 = 0.1$$

The probability that the hidden state is Box 2 is: $$\alpha_1 (2) = \pi_2b_2 (o_1) = 0.4 \times 0.4 = 0.16$$

The probability that the hidden state is Box 3 is: $$\alpha_1 (3) = \pi_3b_3 (o_1) = 0.4 \times 0.7 = 0.28$$

Now we can start to recursion, first of all, to push forward probabilities of 23 states at a time:

Moment 2 is the white ball, the probability of the hidden state being Box 1 is: $$\alpha_2 (1) = \big[\sum\limits_{i=1}^3\alpha_1 (i) a_{i1}\big]b_1 (o_2) = [0.1*0.5+0.16*0.3+ 0.28*0.2] \times 0.5 = 0.077$$

The probability that the hidden state is Box 2 is: $$\alpha_2 (2) = \big[\sum\limits_{i=1}^3\alpha_1 (i) a_{i2}\big]b_2 (o_2) = [0.1*0.2+0.16*0.5+0.28*0.3] \ Times 0.6 = 0.1104$$

The probability that the hidden state is Box 3 is: $$\alpha_2 (3) = \big[\sum\limits_{i=1}^3\alpha_1 (i) a_{i3}\big]b_3 (o_2) = [0.1*0.3+0.16*0.2+0.28*0.5] \ Times 0.3 = 0.0606$$

To continue the recursion, now we are pushing forward probabilities of 33 states at a time:

Moment 3 is the red ball, the probability of the hidden state being Box 1 is: $$\alpha_3 (1) = \big[\sum\limits_{i=1}^3\alpha_2 (i) a_{i1}\big]b_1 (o_3) = [0.077*0.5+0.1104* 0.3+0.0606*0.2] \times 0.5 = 0.04187$$

The probability that the hidden state is Box 2 is: $$\alpha_3 (2) = \big[\sum\limits_{i=1}^3\alpha_2 (i) a_{i2}\big]b_2 (o_3) = [0.077*0.2+0.1104*0.5+0.0606 *0.3] \times 0.4 = 0.03551$$

The probability that the hidden state is Box 3 is: $$\alpha_3 (3) = \big[\sum\limits_{i=1}^3\alpha_3 (i) a_{i3}\big]b_3 (o_3) = [0.077*0.3+0.1104*0.2+0.0606 *0.5] \times 0.7 = 0.05284$$

Finally we find the observation sequence: the probability of $O =\{red, white, and Red \}$ is: $ $P (O|\LAMBDA) = \sum\limits_{i=1}^3\alpha_3 (i) = 0.13022 $$

4. The probability of hmm observation sequence by using the back-direction algorithm

We are familiar with the probability of using forward algorithm to find the HMM observation sequence, and now we can find out how to use the back algorithm to find the probability of hmm observation sequence.

The latter algorithm is very similar to the forward algorithm, which is the dynamic programming, the only difference is that the local state of the selection is different, and the posterior algorithm uses the "posterior probability", then how is the posterior probability defined?

Defines the time $t$ when the hidden state is $q_i$, from the moment $t+1$ to the last moment $t$ the sequence of observed states is $o_{t+1},o_{t+2},... o_t$ probability is the posterior probability. Recorded as: $$\beta_t (i) = P (o_{t+1},o_{t+2},... o_t, i_t =q_i | \lambda) $$

The dynamic programming of the backward probability is the inverse of the recursive formula and the forward probability. Now let's assume that we have found the posterior probability of each hidden state in the moment of $t+1$ $\beta_{t+1} (j) $, and now we need to hand out the back probabilities of each hidden state at the time of $t$. For example, we can calculate the observed state of the sequence is $o_{t+2},o_{t+3},... o_t$, $t $ when the hidden state is $q_i$, time $t+1$ hidden state $q_j$ the probability of $a_{ij}\beta_{t+1} (j) $, Then the sequence of observations can be obtained $o_{t+1},o_{t+2},... o_t$, $t $ when the hidden state is $q_i$, time $t+1$ hidden state $q_j$ The probability of $a_{ij}b_j (o_{t+1}) \beta_{t+1} (j) $, Then the probability of all the following lines is added together, we can get the observed state of the sequence of $o_{t+1},o_{t+2},... o_t$, $t $ when the hidden state is $q_i$ the probability of $\sum\limits_{j=1}^{n}a_{ij}b_j (o_{t+ 1}) \beta_{t+1} (j) $, this probability is the $t$ probability of the moment.

So we get the recursive relation of the backward probability as follows: $$\beta_{t} (i) = \sum\limits_{j=1}^{n}a_{ij}b_j (o_{t+1}) \beta_{t+1} (j) $$

Now we summarize the flow of the forward algorithm, and note the similarities and differences between the next and previous algorithms:

Input: Hmm model $\LAMBDA = (A, B, \PI) $, observation sequence $o= (o_1,o_2,... o_t) $

Output: Observed sequence probability $p (O|\LAMBDA) $

1) The $t$ probability of each hidden state of the initialization moment: $$\beta_t (i) = 1,\; i=1,2,... n$$

2) Recursive moment $t-1,t-2,... 1$ probability of the moment: $$\beta_{t} (i) = \sum\limits_{j=1}^{n}a_{ij}b_j (o_{t+1}) \beta_{t+1} (j), \; i=1,2,... n$$

3) Calculate the final result: $ $P (O|\LAMBDA) = \sum\limits_{i=1}^n\pi_ib_i (o_1) \beta_1 (i) $$

At this point our algorithm time complexity is still $o (tn^2) $.

5. Calculation of common probability of HMM

Using forward probability and posterior probability, we can calculate the probability formula of single State and two states in Hmm.

1) given model $\lambda$ and observed sequence $o$, the probability of $t$ being in a state $q_i$ at a moment is recorded as: $$\gamma_t (i) = P (i_t = Q_i | O,\LAMBDA) = \frac{p (i_t = q_i, O|\lambda)}{p (O|\LAMBDA)} $$

Using the definition of forward and posterior probabilities: $ $P (i_t = q_i, O|\lambda) = \alpha_t (i) \beta_t (i) $$

So we get: $$\gamma_t (i) = \frac{\alpha_t (i) \beta_t (i)}{\sum\limits_{j=1}^n \alpha_t (j) \beta_t (j)}$$

2) given model $\lambda$ and observation sequence $o$, at the moment $t$ is in the state $q_i$, and the probability of $t+1$ in the state $q_j$ is recorded as: $$\xi_t (i,j) = P (i_t = q_i, I_{t+1}=q_j | O,\LAMBDA) = \frac{P (i_t = q_i, I_{t+1}=q_j, O|\lambda)}{p (O|\LAMBDA)} $$

While $p (i_t = q_i, I_{t+1}=q_j, O|\LAMBDA) $ can be represented by a forward-backward probability: $ $P (i_t = q_i, I_{t+1}=q_j, O|\lambda) = \alpha_t (i) A_{ij}b_j (o_{t+ 1}) \beta_{t+1} (j) $$

Thus eventually we get the expression $\xi_t (I,J) $ as follows: $$\xi_t (i,j) = \frac{\alpha_t (i) A_{ij}b_j (o_{t+1}) \beta_{t+1} (j)}{\sum\limits_{r=1}^n \sum\limits_{s=1}^n\alpha_t (R) a_{rs}b_s (O_{t+1}) \beta_{t+1} (s)}$$

3) $\gamma_t (i) $ and $\xi_t (I,J) $ at each time $t$ summation, can be obtained:

Expected value of state $i$ in the observed sequence $o$ $\sum\limits_{t=1}^t\gamma_t (i) $

Expected value transferred by state $i$ under the $o$ of the observed sequence $\sum\limits_{t=1}^{t-1}\gamma_t (i) $

Expected value of transition from state $i$ to state $j$ under observation sequence $o$ $\sum\limits_{t=1}^{t-1}\xi_t (i,j) $

The above-mentioned probability values are used to solve hmm problem two, that is, to solve the HMM model parameters. In the third chapter of this series, we discuss the problem and solution of the HMM parameter.

(Welcome reprint, reproduced please indicate the source.) Welcome to communicate: [email protected])

Hmm (second) forward backward algorithm of hidden Markov model to evaluate the probability of observation sequence

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.