Hidden Markov model, three basic problems and three training algorithms

Source: Internet
Author: User

Referring to a "machine learning society" article and Zong Qing's "Statistical natural language Model", urging yourself to review the HMM model knowledge to prepare for the interview.

This study will tell the hidden Markov chain, which is a particularly common model, in the natural language processing of the application is also very much. Common applications such as * participle, part-of-speech tagging, named entity recognition, and so on sequence labeling problems can use hidden Markov model *. Below, I explain the basic model of HMM and three basic questions according to my own understanding, I hope to understand helpful ~  
implicit Markov model definition
The Hidden Markov model is a probabilistic model of time series, which describes the random generation of non-observable state random sequences by a hidden Markov chain, and then generates an observation by each State to generate an observation random sequence. A sequence of randomly generated states of a hidden Markov chain, called a state sequence, generates an observation of each state, and the resulting random sequence of observations, called the observed sequence (observation sequence). Each position of the sequence can be viewed as a moment. Here we introduce some symbols to represent these definitions: set Q is the set of all possible states, and V is the set of all possible observations.


where n is the number of possible states, and M is the number of possible observations. The status Q is invisible and the observed V is visible. Applied to part-of-speech tagging, V stands for words, which can be observed. Q means that we want to predict the part of speech (a word may correspond to multiple parts of speech) is implied state. When applied to participle, V stands for words, which can be observed. Q represents our labels (b,e these tags, which represent the beginning of a word, or the middle, etc.) applied to named entity recognition, and V for words, which can be observed. Q represents our label (the label represents the place word, the time Word these) above mentioned method, the interested schoolmate may again fine-check the corresponding material. I is a sequence of states of length T, and O is the corresponding observation sequence.

We can be seen as a training set given a word (O) + part of speech (I). or a word (O) + Word label (I) training set .... With the training data, then the training algorithm is a lot of problems can be solved, the problem slowly.
We continue to define a as the state transition probability matrix:

which

is the probability that at the moment T is in the state Qi, the t+1 shifts to the state qj at all times.
b is the observed probability matrix (emission probability):

π is the initial state probability vector:

The hidden Markov model is determined by the initial state probability vector π, the state transition probability matrix A and the observation probability matrix B. π and a determine the state sequence, and B determines the observation sequence. Therefore, the hidden Markov model can be expressed in ternary notation, i.e.

Three elements called Hidden Markov models. If a specific state set Q and the observed sequence v are added, the five-tuple of Hmm is formed, and this is all part of the hidden Markov model. three basic questions

(1) Assess the problem:

For example, the example is as follows: (example from Wikipedia)
Consider a village where all villagers are healthy or have a fever, and only villager doctors can determine if everyone has a fever. The doctor diagnoses the fever by asking the patient's feelings. Villagers can only answer that they feel normal, dizzy or cold. (Here's normal, giddy, cold is the observation sequence we said before) The doctor believes that his patient's health status as a discrete Markov chain. "Health" and "fever" have two states, but doctors cannot directly observe them; health and fever are hidden (health and fever here is the hidden state we said earlier). Every day the patient is given the opportunity to tell the doctor whether he or she is "normal", "cold" or "dizzy" according to the patient's health condition. Observation (normal, cold, dizziness) and hidden State (health, fever) forms the Hidden Markov model (HMM) and can be expressed in the Python programming language as follows:

In this code, start_probability represents the doctor's belief in the state of the HMM when the patient first visits (he knows that the patient is often healthy). The specific probability distribution used here is not balanced, it is (given the transfer probability) about {' Health ': 0.57, ' Fever ': 0.43}. (This is what we said earlier about the initial state probability pi) transition_probability represents the change in health status in the underlying Markov chain. In this case, there is only a 30% chance today, and if he is healthy today, the patient will have a fever. The probability of emission represents the likelihood of a patient feeling every day. If he is healthy, then there is a 50% chance to feel normal, and if he has a fever, then 60% of the chance to feel dizzy.
Then the graph indicates that the above example can be shown as follows:

The first problem is to ask for the probability of the occurrence of an observation sequence given the model. For example, given the HMM model parameters are known, find out three days to observe is (dizzy,cold,normal) probability is how much. The corresponding hmm model parameter known meaning, that is, the A,b,pi matrix is already known.
(2) Model learning problems

We already know that the observation sequence is (dizzy,cold,normal), the need to ask for a hmm parameter problem (so that our observation sequence appears the most probability). That's what we're talking about. A,b,pi Three matrix parameters
(3) Decoding problem/prediction problem

According to the example above, the third problem is. We know that the observation sequence is (dizzy,cold,normal), and we know the parameters of hmm, so that we can find the state sequence that is most likely to correspond to this observation sequence. For example (Healthy,healthy,fever) or (healthy,healthy,healthy), and so on, there are 2 of 3 8 kinds of possible.Three training algorithms

(1) Forward algorithm and back-direction algorithm


(2) Viterbi algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.