The principle of Hidden Markov model (HMM)

Source: Internet
Author: User

This paper mainly discusses three major elements of hidden Markov model, three hypotheses and three major problems.

1. Introduction

The Hidden Markov model is a probabilistic model of time series, which describes a process of generating a state sequence from a hidden Markov chain and then generating an observation sequence from a state sequence. Among them, there is a certain probability relation between the transition of State and the observed sequence and the state sequence. The hidden Markov model is mainly used for modeling the above process. To facilitate the discussion later, we define some symbols first:

Sets a total of n hidden states, which can be expressed as:

With a total of M observation States, the observed set is:

Note: The number of hidden states and the number of observed states are not necessarily the same.

2. Model Overview

There are three main elements in the hidden Markov model, namely: initial state vector, state transition probability matrix A and observation probability matrix B. These three elements determine a model. The model can be represented as:

1) initial state vector. In what state is the model at the very beginning? This is based on the initial state vector, which determines the probability of each state at the very beginning of the model. If there are n states, the length of the first initial state vector is n.

Where: Indicates the probability of state I as the initial state, and the sum of all probabilities is 1.

2) state transition probability matrix A. This matrix mainly describes the transition probabilities between different states. Because there are N states, the size of the state transition probability matrix is n*n.

Where: The probability that the state I is transferred directly to the State J. Also, the sum of probabilities for each row of the state transition probability matrix is 1.

3) observation probability matrix B. Represents the probability that an observation is generated by a state. Its symbol is expressed as:

Where: The probability that the observed k is generated by the state J. Similarly, the sum of the probabilities of each row of the matrix is 1.

Note: The above three elements have remained the same during the training of the model.

In addition to the three main elements above, the hidden Markov model has three hypotheses.

1) Homogeneous Markov hypothesis. Also called the first-order Markov hypothesis, that is, the state of any moment depends only on the state of the previous moment, regardless of other time. The symbol is expressed as:

Promotion: N-order Horse Kefu Model: The state of any moment depends only on the state of the previous N-moment, regardless of other time.

2) Observing the hypothesis of independence. Observations at any time depend only on the state of the moment, regardless of other states.

3) parameter invariance hypothesis. The three elements described above do not change over time, i.e. they remain constant throughout the training process.

At this point, we introduce three major elements and three assumptions, now we have to solve three major problems: probability calculation, prediction and learning problems. They are described in detail below.

3. Probability calculation problem

Problem Description: This problem is mainly in the condition of the known model parameters, the probability of the occurrence of a given observation sequence is obtained. The state sequence is:, its corresponding observation sequence is:

1) Violence calculation--time complexity: O (TNT)

First, we use the most primitive method to analyze. For a certain sequence of States, the probability is:

The probability of the observed sequence is then calculated in the case of the state sequence determination:

Now we combine the above two formulas, that is, the probability of the simultaneous occurrence of the computed state sequence I and the observed sequence O is:

Finally, the probability of all possible states sequence I is computed with the observed sequence O:

Since the length of the state sequence I is t, each state has n possible, so the state sequence I altogether NT species is possible, then each may calculate with the observed sequence o probability, so the time complexity is: O (TNT). The time complexity for this method is too high to be practical.

2) forward algorithm--time complexity O (TN2)

First, we set at the moment T's state is QI, and the probability of observing the sequence as O1,o2,..., ot is

According to homogeneous Markov hypothesis and observing independence hypothesis, we know that each observation is related only to the state of its current moment, and each state is only related to the state of its previous moment. So the forward probability of the t+1 moment is:

The observed probability at t+1 time is only related to the state I of t+1 moment, that is, the probability of generating an observation ot+1 at the t+1 moment state I, then traversing the forward probability of each state in T in the last moment, and then the probability of converting the state J of the above moment to state I. Then we get the recursive formula of the forward probability.

Figure 3.1 Forward probability recursion

The forward algorithm process is as follows:

A) Initialization

B) from the front to the back, gradually recursive, to the t=1,2,..., T-1

C) sum all the forward probabilities of the t moment.

The time of the forward algorithm is traversed from the t=1~t, and then at the moment T, the first step is to traverse each possible state (n states) at that moment, and for each possible state we need to calculate the transfer probability between it and all the states of the previous moment. So the final time complexity of the algorithm is O (TN2).

3) Forward algorithm--time complexity O (TN2)

The backward algorithm moves forward from the moment T, to each moment T, calculates the recursive relationship between the current moment T and the next moment t+1, and sets the observation sequence of the moment t+1 to the moment T to ot+1,ot+2,..., OT, so the posterior probability of the state I at t moment is:

Set

Now, let's analyze the changes in the posterior probabilities from t+1 to T. Because the posterior probability of the t moment is only related to the state of the t+1 moment. Therefore, we need only consider the probability of the N state J of the current moment T's state I transfer to the t+1 moment, and the probability of the state J to generate the observed ot+1 and the posterior probability of J.

The recursive formula is as follows:

The recursive procedure for backward probabilities is shown in 3.2. (a bit similar to a neural network)

Figure 3.2 Backward probability recursion

The post-probabilistic algorithm process:

A) Initialize,

B) from the back forward, to the moment t=t-1,t-2,..., 1. Use recursion formula:

C) finally reached the initialization state,

4. Forecast problems

Problem Description: The most probable state sequence corresponding to the observed sequence is obtained by the known model parameters and observation sequence.

This problem is mainly two methods, in general, the most likely to calculate the state sequence.

1) Approximate algorithm--greedy thoughts--local optimal

The idea of this algorithm is to select the state at which the probability of the moment is maximum at each moment of the observation o. At time t, the probability of the state being I is:

Then select all the most probable states in the moment T:

Finally, the state sequence is obtained by combining the states with the highest probability of each moment: I ' = (i1 ', I2 ',..., in '). But the sequence can only guarantee the optimal state at every moment, but not the optimal whole sequence.

2) Viterbi algorithm –> dynamic programming

In order to achieve the global optimization of the state sequence, we use the idea of dynamic programming. Each state sequence is first treated as a path, and each state is considered to be a node on the path.

When the optimal state sequence of t=1~t ' is i1,i2,..., it ', the optimal sequence of these parts must be included in the overall optimal state sequence, otherwise, the optimal sequence of the whole optimal state sequence in time 1~t ' is I1,i2,..., ik, then the sequence i1,i2,..., the IK ratio sequence I1,I2 ,..., it ' better and conditional i1,i2,..., It ' is the best contradiction, so we can solve the current optimal state sequence by the moment, until the moment T.

Define all the individual path states at the moment T status of I i1,i2,..., it the maximum probability value is:

When the time is t+1:

The recursive formula above can calculate the probability of the occurrence of the state sequence, but does not get the corresponding state sequence, so we need to use the following equation to save the state sequence in each step:

The Viterbi algorithm process is as follows:

1) initialization, when t=1, there are:

2) using the recursive formula, t=2,3,..., T,

3) The last state in the state sequence that gets the maximum probability is:

4) The optimal state path is obtained by backtracking in turn:

The optimal paths are: I1 ', I2 ', ..., IT '.

5. Learning Problems

Problem Description: Known observation sequence, estimating model parameters.

There are two main cases for this problem, firstly, the supervised learning method of observation sequence and corresponding state sequence is given, the second is to give only the observation sequence, and not to give the unsupervised learning method of the state sequence. It is now discussed separately:

1) Supervised learning Method-gives the observed sequence + corresponding state-to-maximum likelihood estimation

Suppose the given training set is: {(O1,I2), (O2,I2), ..., (Ot,it)}, at which point we can estimate the model parameters directly using the maximum likelihood estimation method.

A) state transition probability estimation. The number of times that a direct transition from state I to state J is AIJ, the probability of being transferred from state I to state J is estimated as:

B) observation probability estimation. The number of times the k is observed when the state is J in the sample is BJK, then the probability estimate of the state J to generate the observed K BJK ' is:

C) initial state probability. The first state I in the sample state sequence corresponds to the initial probability of 1 and the remainder to 0.

2) Unsupervised learning Method--only observation sequence, no corresponding state sequence---forward backward algorithm.

In the preceding, we have introduced the forward algorithm and the back algorithm. According to the forward probability and the posterior probability, the probability that the observed sequence is O and the state at t moment is Qi is:

Considering the forward and posterior synthesis of the moment T, the above formula can be obtained, which is shown in Analysis 5.1.

Figure 5.1 State diagram of the moment T

The probability of observing the status of O at t moment is Qi:

Therefore, the expected value of Qi appears in the condition observed as O:

The probability that the state is Qi at t moment and that the state is QJ at t+1 time is:

Among them, the probability of observing o and the state of the T moment as QI and the state of the t+1 moment is QJ:

Since the probability of BI (OT) has been considered at the time of computing at (i), only BJ (OT+1) is in the upper style.

Figure 5.2 The state of the moment T to T+1

The probability of observing o at this point can also be written as:

Then the expected value transferred by the state I to the state J is:

Now we can estimate the model parameters:

A) initial state probability vector:, where

B) The state transition probability matrix:

C) observation probability matrix:, where

For the probability of generating observed VK in the state J, first we need to find the state J of each moment, and then find out the number of observed VK generated by state J.

6. Summary

The Hidden Markov model is composed of three main elements, then three big problems are solved by using three hypotheses. In the three major problems, the first is the probability computation problem, which mainly uses the forward algorithm or the back-direction algorithm; we can solve the problem that only the observation sequence model parameter is solved by combining the forward algorithm and the back-direction algorithm, namely the learning problem; Finally, the prediction problem of state sequence mainly uses the Viterbi algorithm, which uses the idea of dynamic programming So that the final calculated state sequence can achieve the overall optimal.

In addition, the forward backward algorithm in this paper only gives some calculation formulas, which are not analyzed in depth, in fact, these formulas can be deduced using EM algorithm.

Reference documents:

[1] Hangyuan Li, Statistical learning methods

[2] Peghoty, http://blog.csdn.net/itplus/article/details/15335811

[3] Lixiang, Http://www.leexiang.com/hidden-markov-model

[4] Jihite, http://www.cnblogs.com/kaituorensheng/archive/2012/12/06/2806263.html

The principle of Hidden Markov model (HMM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.