Statistical learning Method Hangyuan Li---The 10th chapter hidden Markov model

Source: Internet
Author: User

The 10th chapter hidden Markov model

Hidden Markov models (hidden Markov model, HMM) are statistical learning models that can be used for labeling problems, and describe the process of randomly generating observation sequences from hidden Markov chains, which belong to the generation model.

10.1 Basic concepts of hidden Markov models

definition 10.1 (Hidden Markov model) The hidden Markov model is a probabilistic model of time series, which describes the random generation of non-observable state random sequences by a hidden Markov chain, and then the process of generating an observation random sequence from each state. A sequence of randomly generated states of a hidden Markov chain, called a state sequence: each state generates an observation, and the resulting random sequence of observations is called an observation sequence (observation Sequenoe). Each position of the sequence can be viewed as a moment.

The hidden Markov model is determined by initial probability distribution, state transition probability distribution and observation probability distribution. The form of the hidden Markov model is defined as follows

Set Q is the set of all possible states, and V is the set of all possible outlooks.

where n is the number of possible states, and M is the number of possible observations.

I is a sequence of states of length T, and O is the corresponding observation sequence.

A is the state transition probability matrix:

is the probability that at the moment T is in the state Qi, the t+1 shifts to the state qj at all times.

b is the observation probability matrix:

is the probability that the observed VK is generated at the moment T is in the state QJ condition.

N is the initial state probability vector:

is the probability of t=1 being in state qi at all times.

Hidden Markov model brother can be expressed in ternary notation, i.e.

The hidden Markov model is determined by the initial state probability vector Pi, the state transition probability matrix A and the observation probability matrix B. A,b,pi is called the three elements of the hidden Markov model.

The state transition probability matrix A and the initial state probability vector pi determine the hidden Markov chain and generate the non-observable state sequence.

The observation probability matrix B determines how to generate observations from the state, and how to produce the observed sequence with the state sequence.

The hidden Markov model makes two basic assumptions:

(1) Homogeneous Markov hypothesis , that is, assuming that the state of the hidden Markov chain at any moment of T is dependent on the state of its previous moment, it has nothing to do with the state and observation of other moments, nor is it irrelevant to the moment T.

(2) observing the hypothesis of independence , which assumes that the observation at any time depends only on the state of the Markov chain at that time, and is independent of other observations and states.

Hidden Markov models can be used for labeling, when the state corresponds to a marker. The labeling problem is that the sequence of the given observation predicts its corresponding marker sequence to assume that the data of the labeling problem is generated by the hidden Markov model. So we can use the hidden Markov model to annotate the learning and prediction algorithms.

According to the definition of the hidden Markov model, an observation sequence of length t can be defined, and its generating process is described as follows:

3 basic problems of hidden Markov model

(1) Probability calculation problem. Given the model and the observed sequence o= (O1,o2,..., OT), calculates the probability that the observed sequence o appears under the model LAMDA.

(2) Learning problems. The observed sequence o= (O1,o2,..., OT) is known, and the model parameters are estimated so that the probability of the observed sequence is maximal under the model. The parameter is estimated by the method of maximum likelihood estimation.

(3) Prediction problem, also known as decoding (decoding) problem. Known model and observation sequence o= (O1,o2,..., OT), the conditional probability p (I | O) the largest state sequence i= (I1,i2,..., IT), that is, given the observed sequence, the most probable corresponding state sequence.

10.2 Probability calculation algorithm

Direct calculation method

Calculated directly by the probability formula by enumerating all the possible lengths of the state sequence i= (I1,i2,..., IT), each state sequence I and the observed sequence o= (O1,o2,..., OT)

The joint probabilities, and then the sum of all possible sequences of States, is obtained.

However, the computation is very large, is the O (TNT) order, this algorithm is not OK.

Forward-to-back algorithm (Forward-backward algorithm)

Forward Algorithm

Define 10.2 (forward probability) a given hidden Markov model, defined to the moment T part observation sequence O1,o2,..., OT and the probability of the state Qi is forward probability, recorded as

Forward probability and observation sequence probability can be obtained by recursion

Step (1) Initialize the forward probability, which is the joint probability of the state I1=qi and the observed O1 at the initial time.

Step (2) is the recursive formula of the forward probability, calculates to the moment t+1 partial observation sequence is O1,o2,..., OT, ot+1 and at the moment t+1 in the state Qi's forward probability, 10.1 shows.

Ai (j) is the forward probability of observing the O1,o2,..., ot at the moment T and at the moment T is in the state QJ, then the product AI (j) Aji is the joint probability that the time t is observed to O1,o2,..., ot at the moment T is in the state qj and at the moment T+1 arrives at the state Qi. The sum of all possible n states of this product at a moment T, the result is a joint probability that the time t is observed as O1,o2,..., OT and t+1 in the state Qi at the moment of QJ. The product of the value in square brackets and the observed probability bi (ot+1) is exactly the moment t+1 observed

O1,o2,..., OT, ot+1 and at the moment t+1 in the forward probability of State qi.

Step (3):

Because

So

The forward algorithm is actually a recursive algorithm based on the "path structure of state sequence". The key to the high efficiency of forward algorithm is its local calculation of forward probability, then the forward probability is "recursive" to the global by using the path structure. Specifically, at the moment t=1, the N values of A1 (i) are computed (i=1,2,..., N), the n values of T-1 (i) are calculated at each moment t=1,2,..., at+1, and the calculation of each i=1,2 (i) takes advantage of the previous time Ai (j).

The reason for reducing the calculation is that each calculation directly refers to the calculation results of the previous moment, avoiding the repetition of the calculation. The computational amount calculated by using the forward probability is O (n2t) order, rather than the O (TNT) Order of direct calculation.

the back algorithm

defines a 10.3 (posterior probability) given hidden Markov model, which defines the probability of a part of the observed sequence from t+1 to T as Ot+1,ot+2,..., ot as a posterior probability, in the condition of Qi at the moment of T state.

Backward probability and observation sequence probability can be obtained by recursion

Steps (1) initialize the post-initialization probability to the final moment of all States Qi rules.

Step (2) is a recursive formula for the posterior probability, 10.3 shows that, in order to calculate the moment T state is QI, the partial observation sequence from t+1 to T is the posterior probability of the ot+1,ot+2,..., OT. Just consider the probability of the transfer of all possible n states in the moment T 11 (i.e. the AIJ term), and the observed probability of the observed ot+1 in this state (ie, BJ (oi+1)), and then consider the posterior probability (i.e. the term) of the observed sequence after the state qj.

Step (3) The solution of the idea is consistent with the step (2), but the initial probability instead of the transfer probability.

Using the definition of forward probability and posterior probability, the probability of observing sequence can be uniformly written

This formula is the formula (10.17) and the formula (10.21) when T=1 and T=t-1 are respectively.

Some calculation of probability and expectation

Given the hidden Markov model and the observed O, using forward and posterior probabilities, we can get a formula for the single State and the probability of two states,

(1) The probability that T is in State Qi at the moment. Remember

The definition of forward and posterior probabilities indicates that

So

(2) The probability that T is in the state Qi at the moment and t+1 in the state QJ at the moment. Remember

Calculated based on forward and backward probabilities:

So

(3) Summing the probabilities in (1) and (2) to each moment T, you can get some useful expectation

Expected in the observed O state i:

The expected value transferred by the state I under observation o:

The expected value of the shift from state I to state J under observation O:

10.3 Learning Algorithms

The study of Hidden Markov model can be realized by supervised learning and unsupervised learning according to the training data including observation sequence and corresponding state sequence or only observation sequence.

Supervisory Studies Learning Algorithm

Assuming that the training data contains an observation sequence with the same length of s and a corresponding state sequence {(O1,I1),..., (Os,is)}, the maximum likelihood estimation method is used to estimate the hidden Markov modulus

Parameters of the model.

(1) Estimation of transfer probability AIJ
The frequency of the t+1 transfer to the state J at the time of the sample is AIJ, then the estimation of the state transition probability is

(2) Estimation of the observed probability BJ (k)

The frequency at which the state is j in the sample and observed as K is BJK, then the probability of the state being J observed as K is

(3) The initial state probability is estimated to be the frequency of Qi in the initial state of the S sample.

Unsupervised Learning Algorithm--baum-welch algorithm (EM algorithm)

Assuming that the given training data contains only an observation sequence of s length t {O1,..., OS)} without a corresponding state sequence, the goal is to learn the parameters of the hidden Markov model. Observing sequence data as observation data O, state sequence data as invisible hidden data I, then the hidden Markov model is actually a probabilistic model with implicit variables.

Its parameter learning can be realized by the EM algorithm.

Full data consists of observational data and hidden data (o,i) = (O1,o2,..., Ot,i1,i2,..., IT), logarithmic likelihood function is

the E-step of the EM algorithm: ask for Q function

M-Step of EM algorithm: maximal q function for model parameter: Maximum likelihood function can be obtained,

Using the previously obtained probability representation, the available algorithm is detailed as follows:

10.4 Predictive Algorithms

Two algorithms of Hidden Markov model prediction: Approximate algorithm and Viterbi algorithm (Viterbi algorithm)

Approximate algorithm

The idea of the approximate algorithm is that at each moment T chooses the state that is most likely to appear at that moment it*, and gets a status sequence as the result of the prediction.

At every moment t the most probable state of it* is obtained by the following formula:

The advantage of the approximate algorithm is that the calculation is simple, and its disadvantage is that the predicted state sequence is not guaranteed to be the most probable state sequence, because the predicted state sequence may have a part that actually does not occur. There may be adjacent states with a transition probability of 0 in the resulting state sequence

Viterbi algorithm

The Viterbi algorithm is used to solve the pre-side problem of Markov model with dynamic programming, that is, the maximum probability path (optimal path) is obtained by Dynamic programming (programming). A path corresponds to a sequence of states.

According to the principle of dynamic programming, the optimal path has such characteristics: if the optimal path at the moment T through the node it*, then this path from the node it* to the end of the it* part of the path, for all possible parts of the path from it* to it* must be optimal. Based on this principle, we only need to calculate the maximum probability of the path of each part of the time t state as I at the moment t=1, until we get the maximum probability of each path of the t=t state at the moment. The maximal probability of moment t=t is the probability p* of the optimal path.

The end point it* of the optimal path is also obtained at the same time. Then, starting from the end point it*, the node it-1*, ..., i1*.

Statistical learning Method Hangyuan Li---The 10th chapter hidden Markov model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.