Hidden Markov Model

Source: Internet
Author: User

Last year, I made a hmm word divider, and sorted it out as required by @ jnduan.

This article has more formal symbols, because I hope to understand from the perspective of the model.AlgorithmBy default, the readers have learned some background knowledge about the mathematical ideas in details. (We recommend the word segmentation chapter in the beauty series of Dr. Wu Jun as a pre-course for this article)

 

1. Markov process-random variables that are not independent from each other depend on the previous random variable sequence

N finite states S = {S1, S2 ,... sn}, random variable sequence Q (Q1, q2 ,..., in QT), the value of any variable is a certain State in set S. The probability that the system is in t state SJ is:

(1)

If the state of the system at time t is only related to the State at time t-1, formula (1) can obtain formula (2) to obtain a discreteLevel 1 Markov Chain

(2)

AIJ is in Si to SJ stateState Transfer Probability, Where:

The number of State transfer probabilities depends on the number of State sets and the order of the Markov process. For example, a first-order process with n States has n square state transfer

 

2. Hidden Markov Model

In the Markov model, t State S representsObservabilityEvent Q, the value of event Q depends on the previous event qt-1 that occurred

In the Hidden Markov Model, the random process of event O is not observed.Hidden state (s)ToObservation symbol (V)OfRandom Function BJ (K ),Time tObserved event (Ot = V)The valueHide events (Qt = s)AndHide event QTThe valueHide event qt-1The decision is equivalent to a double Markov process.

In a Markov model, the model μ = (s, A, π), where:

S = Status set
A = state transfer probability matrix
π = initial state probability distribution

In the hidden Markov model, the model μ = (S, V, A, B, π), where:

S = hidden state set. The set size is N.

V = a collection of observed symbols. The set size is M.

A = state transfer probability matrix, which is the transition probability matrix from the State Si to SJ. The size is N * n.

B = observed probability matrix, which hides the probability matrix from the State SJ to the observed symbol VK, And the size is N * m.

π = initial state probability distribution, with the size of N

Q and O are the output sequence of the hidden state set S and the observation symbol V about the time series T, respectively.

State transfer probability matrix


Observed probability matrix

 

Any model implies that the data is subject to a series of assumptions of the model. Hidden Markov Model has two basic assumptions.

Hypothesis 1: homogeneous Markov hypothesis, that is, the State (QT) of hidden Markov chains at any moment t is dependent on the state (qt-1) of the previous moment ), it has nothing to do with the hidden state and observation of other moments, but also with the time t, abbreviated

Hypothesis 2: Observation independence hypothesis, that is, if the observation (OT) at any time only depends on the hidden state (QT) of the Markov chain at that time, it is irrelevant to other observations and hidden states, abbreviated

 

3. Hidden Markov Problems

The Hidden Markov Model is basically used to solve three basic problems. In the word segmentation field, the process from model learning to word segmentation result prediction is mainly used:

A,Calculate ProbabilityThat is, the given Model μ = (a, B, π) and the observation sequence O (O1, O2 ,..., ot), calculate the likelihood of O occurrence in the observed sequence under Model μ, and calculate P (o | μ)

B,Learning ModelThe observed sequence O (O1, O2 ,..., ot), estimating the parameters of the model μ = (a, B, π) maximizes the likelihood of O under the Model μ, that is, the maximum likelihood method for P (μ | O)

C,Prediction ResultKnown as the decoding problem. We know the Model μ = (a, B, π) and the observed sequence O (O1, O2 ,..., ot), calculate the maximum probability P (q | o, μ) of the given observed sequence's hidden state sequence Q

 

3.1. Probability Calculation Problems

Direct Calculation Method

The probability of the Status sequence Q (Q1, q2,..., qN) is

The probability of O in the observed sequence is

The joint probability of O and I is

Expand

 

Forward Algorithm

Defines the hidden state sequence Q (Q1, q2 ,..., qt) and observation sequence O (O1, O2 ,..., ot), event QT value is expressed as any value Si, qt-1 is SJ, ot is observed as VK, then the transition probability from Si to SJ is expressed as AIJ, the observed probability from SJ to VK is BJ (k). The forward probability and the probability P (o | μ) of the observed sequence are obtained by recursive method)

Initial Value

Recurrence

Please

 

3.2 learning problems

Supervised Learning Method

Assuming that the known training data contains t state sequences with the same length and corresponding observation sequences, we use the maximum likelihood method to estimate the Model μ, and evaluate μ = {A, B, π}

Estimated Transfer Probability AIJ is

Estimation of the observed probability BJ (k) is

It is estimated that the initial state π is the frequency of the initial state Q in S samples.

Unsupervised learning method (EM algorithm omitted)

 

3.3 prediction Problems

Viterbi Algorithm

The Viterbi algorithm uses the idea of dynamic planning to solve the given observation sequence O (O1, O2 ,..., Ot) optimal hidden path in Model μ Q (Q1, q2 ,..., Qt), that is, at the moment t to determine the optimal path of the moment T-1, to obtain the final moment t optimal path end I *, backtracking

1,Initialization

Describe the probability of each candidate's hidden state at the moment T = 1 delta_1 (I)

Initialize the Optimal Path node variable

2,Recursion, t = 2, 3,..., T

Probability of candidate hidden state at moment t delta_t (I) equals to any hidden node T-1 at previous moment delta_t-1 (j) multiply the state transition probability Aji corresponding to the state, find the maximum solution MAX [], and then multiply the observed probability Bik to obtain the full solution.

Finding the node J with the greatest solution in the moment t-1 as the optimal path of the moment t-1

3,Termination

Obtain the maximum probability of the final moment t and the final node of the optimal path I *

(To be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.