Statistical learning Method Hangyuan Li---The 10th chapter hidden Markov model

Source: Internet
Author: User
The 10th chapter hidden Markov model

Hidden Markov models (hidden Markov model, HMM) are statistical learning models that can be used for labeling problems, and describe the process of randomly generating observation sequences from hidden Markov chains, which belong to the generation model. 10.1 Basic concepts of hidden Markov models

definition 10.1 (Hidden Markov model) The hidden Markov model is a probabilistic model of time series, which describes the random generation of non-observable state random sequences by a hidden Markov chain, and then the process of generating an observation random sequence from each state. A sequence of randomly generated states of a hidden Markov chain, called a state sequence: each state generates an observation, and the resulting random sequence of observations is called an observation sequence (observation Sequenoe). Each position of the sequence can be viewed as a moment.

The hidden Markov model is determined by initial probability distribution, state transition probability distribution and observation probability distribution. The form of the hidden Markov model is defined as follows

Set Q is the set of all possible states, and V is the set of all possible outlooks.

where n is the number of possible states, and M is the number of possible observations.

I is a sequence of states of length T, and O is the corresponding observation sequence.

A is the state transition probability matrix:

is the probability that at the moment T is in the state Qi, the t+1 shifts to the state qj at all times.

b is the observation probability matrix:

is the probability that the observed VK is generated at the moment T is in the state QJ condition.

N is the initial state probability vector:

is the probability of t=1 being in state qi at all times.

Hidden Markov model brother can be expressed in ternary notation, i.e.

The hidden Markov model is determined by the initial state probability vector Pi, the state transition probability matrix A and the observation probability matrix B. A,b,pi is called the three elements of the hidden Markov model.

The state transition probability matrix A and the initial state probability vector pi determine the hidden Markov chain and generate the non-observable state sequence.

The observation probability matrix B determines how to generate observations from the state, and how to produce the observed sequence with the state sequence.

The hidden Markov model makes two basic assumptions:

(1) Homogeneous Markov hypothesis , that is, assuming that the state of the hidden Markov chain at any moment of T is dependent on the state of its previous moment, it has nothing to do with the state and observation of other moments, nor is it irrelevant to the moment T.

(2) observing the hypothesis of independence , which assumes that the observation at any time depends only on the state of the Markov chain at that time, and is independent of other observations and states.

Hidden Markov models can be used for labeling, when the state corresponds to a marker. The labeling problem is that the sequence of the given observation predicts its corresponding marker sequence to assume that the data of the labeling problem is generated by the hidden Markov model. So we can use the hidden Markov model to annotate the learning and prediction algorithms.

According to the definition of the hidden Markov model, an observation sequence of length t can be defined, and its generating process is described as follows:

3 Basic problems of hidden Markov model

(1) Probability calculation problem. Given the model and the observed sequence o= (O1,o2,..., OT), calculates the probability that the observed sequence o appears under the model LAMDA.

(2) Learning problems. The observed sequence o= (O1,o2,..., OT) is known, and the model parameters are estimated so that the probability of the observed sequence is maximal under the model. The parameter is estimated by the method of maximum likelihood estimation.

(3) Prediction problem, also known as decoding (decoding) problem. Known model and observation sequence o= (O1,o2,..., OT), the conditional probability p (I | O) the largest state sequence i= (I1,i2,..., IT), that is, given the observed sequence, the most probable corresponding state sequence. 10.2 probability calculation algorithm

Direct calculation Method

Calculated directly by the probability formula by enumerating all the possible lengths of the state sequence i= (I1,i2,..., IT), each state sequence I and the observed sequence o= (O1,o2,..., OT)

The joint probabilities, and then the sum of all possible sequences of States, is obtained.

However, the computation is very large, is the O (TNT) order, this algorithm is not OK.

forward-to-back algorithm (Forward-backward algorithm)

Forward Algorithm

Define 10.2 (forward probability) a given hidden Markov model, defined to the moment T part observation sequence O1,o2,..., OT and the probability of the state Qi is forward probability, recorded as

Forward probability and observation sequence probability can be obtained by recursion

Step (1) Initialize the forward probability, which is the joint probability of the state I1=qi and the observed O1 at the initial time.

Step (2) is a recursive formula of the forward probability, calculated to the moment t+1 part observation sequence for O1,o2,..., OT, ot+1 and at the moment t+1 in the state Qi forward probability, as shown in Figure 10.1.

Ai (j) is the forward probability of observing the O1,o2,..., ot at the moment T and at the moment T is in the state QJ, then the product AI (j) Aji is the joint probability that the time t is observed to O1,o2,..., ot at the moment T is in the state qj and at the moment T+1 arrives at the state Qi. The sum of all possible n states of this product at a moment T, the result is a joint probability that the time t is observed as O1,o2,..., OT and t+1 in the state Qi at the moment of QJ. The product of the value in square brackets and the observed probability bi (ot+1) is exactly the moment t+1 observed

O1,o2,..., OT, ot+1 and at the moment t+1 in the forward probability of State qi.

Step (3):

Because

So

The forward algorithm is actually a recursive algorithm based on the "path structure of state sequence". The key to the high efficiency of forward algorithm is its local calculation of forward probability, then the forward probability is "recursive" to the global by using the path structure. Specifically, at the moment t=1, the N values of A1 (i) are computed (i=1,2,..., N), the n values of T-1 (i) are calculated at each moment t=1,2,..., at+1, and the calculation of each i=1,2 (i) takes advantage of the previous time Ai (j).

The reason for reducing the calculation is that each calculation directly refers to the calculation results of the previous moment, avoiding the repetition of the calculation. The computational amount calculated by using the forward probability is O (n2t) order, rather than the O (TNT) Order of direct calculation.

the back algorithm

defines a 10.3 (posterior probability) given hidden Markov model, which defines the probability of a part of the observed sequence from t+1 to T as Ot+1,ot+2,..., ot as a posterior probability, in the condition of Qi at the moment of T state.

Backward probability and observation sequence probability can be obtained by recursion

Steps (1) initialize the post-initialization probability to the final moment of all States Qi rules.

Step (2) is a recursive formula of the backward probability, as shown in Figure 10.3, in order to calculate the moment T state is QI, the partial observation sequence from t+1 to T is ot+1,ot+2,..., OT's backward probability. Just consider the probability of the transfer of all possible n states in the moment T 11 (i.e. the AIJ term), and the observed probability of the observed ot+1 in this state (ie, BJ (oi+1)), and then consider the posterior probability (i.e. the term) of the observed sequence after the state qj.

Step (3) The solution of the idea is consistent with the step (2), but the initial probability instead of the transfer probability.

Using the definition of forward probability and posterior probability, the probability of observing sequence can be uniformly written

This formula is the formula (10.17) and the formula (10.21) when T=1 and T=t-1 are respectively.

some calculation of probability and expectation

Given the hidden Markov model and the observed O, using forward and posterior probabilities, we can get a formula for the single State and the probability of two states,

(1) The probability that T is in State Qi at the moment. Remember

The definition of forward and posterior probabilities indicates that

So

(2) The probability that T is in the state Qi at the moment and t+1 in the state QJ at the moment. Remember

Calculated based on forward and backward probabilities:

So

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.