HMM (Hidden Markov model)

Last Update:2016-06-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Hidden Markov model (Hidden Markov model,hmm) is a statistical model used to describe a Markov process with hidden unknown parameters. The difficulty is to determine the implicit parameters of the procedure from observable parameters. These parameters are then used for further analysis, such as pattern recognition.

is a statistical Markov model that is considered to be a Markov process with unobserved (hidden) states in the modeled system.

Here is a simple example to illustrate:

Let's say I have three different dice in my hand. The first dice are our usual dice (called this dice for D6), 6 faces, each face (1,2,3,4,5,6) The probability of occurrence is 1/6. The second dice is a tetrahedron (called this dice D4), each face (1,2,3,4) The probability of occurrence is 1/4. The third dice has eight faces (called the Dice is D8), each face (1,2,3,4,5,6,7,8) The probability of occurrence is 1/8.

Suppose we start to roll the dice, we first pick from three dice, the probability of picking each dice is 1/3. Then we roll the dice and get a number, one of the 1,2,3,4,5,6,7,8. Repeating the process, we get a bunch of numbers, each of which is one of the 1,2,3,4,5,6,7,8. For example, we might get a bunch of numbers (roll the Dice 10 times): 1 6 3 5 2 7 3 5 2 4

This string of numbers is called the visible state chain. But in the hidden Markov model, we not only have such a chain of visible states, but also a chain of hidden states. In this case, the chain of implied states is the sequence of the dice you use. For example, the implied state chain might be: D6 D8 D8 D6 D4 D8 D6 D6 D4 D8

In general, the Markov chain mentioned in Hmm actually refers to the implied state chain, because there is a conversion probability between the implied state (DICE) (transition probability). In our example, the next state of D6 is that the probability of D4,d6,d8 is 1/3. The next state of D4,d8 is that the conversion probability of D4,D6,D8 is also 1/3. This is set up to be easy to say at first, but we can set the conversion probability at will. For example, we can define, D6 behind D4,D6 is D6 probability is 0.9, is the probability of D8 is 0.1. This is a new hmm.

Similarly, although there is no conversion probability between visible states, there is a probability between the implied state and the visible state called the output probability (emission probability). For our example, six-side dice (D6) produces a 1 output probability of 1/6. The probability of generating 2,3,4,5,6 is also 1/6. We can also make other definitions of the output probabilities. For example, I have a casino to move the hands and feet of the six-sided dice, throw out is 1 probability is greater, is 1/2, throw out is the probability of 2,3,4,5,6 is 1/10.

In fact, for Hmm, it is fairly easy to do simulations if you know in advance the probability of conversion between all implied states and the output probabilities of all implied states to all visible states. But when applying the HMM model, it is often missing part of the information, sometimes you know how many dice, each kind of dice is what, but do not know the dice roll out of the sequence; sometimes you just see the results of a lot of dice, and the rest don't know. If the algorithm is used to estimate the missing information, it becomes a very important problem. I'll talk about these algorithms in more detail below.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     If you just want to see an easy-to-understand example, you don't need to look down.
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    said two nonsense, answer the Lord think, to understand an algorithm, to do the following two points: will its meaning, know its shape. The main answer, in fact, is the 1th. But this is precisely the most important, and many books do not say. As you are chasing a girl, the girl says to you, "you have done nothing wrong!" "If you only look at the expression of a girl and think that you have done nothing wrong, you are clearly mistaken." You have to pay attention to what the girl means, "you hasten to apologize to me!" "So when you see the corresponding form of expression, hurriedly admit the wrong, kneel to beg for mercy." Mathematics is the same, if you do not understand the meaning, light to see the formula, often confused. However, the expression of mathematics is at most the obscure point, the expression of the girl, sometimes completely contrary to the original intention. So the Lord has always thought that understanding a girl is much harder than understanding math.

Back to the point, and HMM model-related algorithms are divided into three categories, respectively, to solve three kinds of problems:
1) know the dice have several (implied state quantity), what each kind of dice is (conversion probability), according to roll the dice to throw the result (visible state chain), I want to know each throw out what kind of dice (hidden state chain).
This problem, in the field of speech recognition, is called decoding problem. There are actually two solutions to this problem, and two different answers are given. Every answer is right, but the meaning of these answers is different. The first solution to find the maximum likelihood state path, said the popular point, is I beg a series of dice sequence, this series of dice sequence produces the most probability of observation results. The second solution is not to ask for a set of dice sequence, but to ask each roll of the dice is a certain kind of dice probability. For example, after I see the results, I can find the first time to roll the dice is the probability of D4 is the probability of 0.5,D6 is 0.3,d8 probability is 0.2. The first solution I will say below, but the second solution I will not write here, if everyone is interested, we have another question to continue to write.

2) Still know that there are several dice (implied number of States) , what is each kind of dice (conversion probability) , according to the result of the roll of the dice (visible status chain) , I want to know the probability of throwing this result.
It seems that this problem is of little significance, because the result you throw is a lot of time corresponding to a relatively large probability. The purpose of this question is to test whether the observed results are consistent with the known models. If many times the result corresponds to a relatively small probability, then it shows that we know the model is probably wrong, someone secretly put our dice to change.

3) Know that there are several dice (implied number of States) and I don't know what each kind of dice is (conversion probability) , and observed the results of many dice (visible status chain) and I want to counter-launch each kind of dice is what (conversion probability) .
This issue is important because this is the most common situation. Many times we have only visible results, we do not know the parameters in the HMM model, we need to estimate these parameters from the visible results, which is a necessary step in modeling.

The problem has been explained, and the solution is explained below. (Issue No. No. 0 is not mentioned above, just as an aid to solve the above problems)
0. A simple question
In fact, the practical value of this problem is not high. Because it is helpful to the following difficult questions, I will mention it here first.
Know that there are several dice, each dice is what, each throw is what dice, according to the results of the roll dice, to produce the probability of this result.

The solution is simply the multiplication of probabilities:

1. See invisible, crack the dice sequence
Here I'm talking about the first solution, solving the maximum likelihood path problem.
For example, I know I have three dice, six-sided dice, four-sided dice, eight sides of a dice. And I know I threw it 10 times. (1 6 3 5 2 7 3 5 2 4), I don't know every time I use that kind of dice, I want to know the most likely dice sequence.

In fact, the simplest and most violent method is to make up all possible dice sequences, and then calculate the probability of each sequence according to the solution of the 0th question. Then we can pick out the sequence of the maximum probability from the inside. If the Markov chain is not long, of course it works. If it is long, the number of poor lifting is too large, it is difficult to complete.
Another well-known algorithm is called Viterbi algorithm. To understand this algorithm, let's look at a few simple examples first.
First, if we roll the dice only once:

See the result as 1. The corresponding maximum probability of the dice sequence is D4, because D4 produces 1 of the probability is 1/4, higher than 1/6 and 1/8.
To expand the situation, we roll two dice:

The result is 1,6. At this point the problem becomes complicated, and we have to calculate three values, namely the maximum probability that the second dice is the D6,d4,d8. Obviously, to get the maximum probability, the first dice must be D4. At this point, the maximum probability of the second dice being taken to a D6 is

Similarly, we can calculate the maximum probability of a second die when it is D4 or D8. We found that the second dice had the greatest probability of taking the D6. and to make this probability maximum, the first dice is D4. So the maximum probability of the dice sequence is D4 D6.
Continue to expand, we throw three times dice:

Again, we calculate the maximum probability that the third dice are d6,d4,d8 respectively. Again, we find that the second dice must be D6 to get the maximum probability. At this point, the maximum probability of the third dice taking the D4 is

As above, we can calculate the maximum probability of a third die when it is D6 or D8. We found that the third dice had the greatest probability of taking the D4. To make this probability maximum, the second dice is D6, the first dice is D4. So the maximum probability of the dice sequence is D4 D6 D4.

Write here, you should see the rules. Since roll the dice 123 times can count, toss how many times can be and so on. We found that we asked for the maximum probability of the dice sequence when doing so few things. First, regardless of how long the sequence is, to calculate the length of the sequence from 1, the maximum probability of each dice is taken from 1 o'clock. Then, gradually increase the length, each additional length, re-calculate the last position at this length to take the maximum probability of each dice. Because the maximum probability of each dice is calculated from the last length, it is not difficult to recalculate. When we count to the last one, we know which one of the last dice is the most probable. Then, we are going to push the sequence that corresponds to this maximum probability from the forward.
2. Who Moved my dice?
For example, you suspect that your six-side sic has been moved by the casino, it is possible to be replaced by another six-side dice, the six-side dice roll out is 1 of the probability is greater, 1/2, the probability of throwing out is 2,3,4,5,6 1/10. What do you do? The answer is simple, calculate a normal three dice throw a sequence of probability, and then calculate an abnormal six-sided dice and the other two normal dice throw this sequence probability. If the former is smaller than the latter, you must be careful.
For example, the result of the dice is:

To calculate the probability of throwing the result with a normal three dice is to add and calculate the probability of all possible cases. Similarly, the simple and violent method is to make all the dice sequence, or calculate the probability of each dice sequence corresponding, but this time, we do not pick the maximum, but the sum of all the calculated probabilities, the total probability is the result of our request. This method still cannot be applied to too long dice sequences (Markov chains).
We will apply a solution that is similar to the previous one, except that the previous one is concerned with the probability of the maximum, and this problem concerns the sum of probabilities. The algorithm to solve this problem is called forward algorithm (forward algorithm).
First, if we roll the dice only once:

See the result as 1. The total probability of generating this result can be calculated as follows, with a total probability of 0.18:

To expand the situation, we roll two dice:

See the result as 1,6. The total probability of generating this result can be calculated as follows, with a total probability of 0.05:

Continue to expand, we throw three times dice:

See the result as 1,6,3. The total probability of generating this result can be calculated as follows, with a total probability of 0.03:

Similarly, we take a step by step calculation, how long, how long, and then the long Markov chain can always be counted out. In the same way, you can also figure out the abnormal six-sided dice and the other two normal die throw this sequence of probability, and then we compare the two probability size, we can know that your dice is not changed by people.

Viterbi algorithm

HMM (Hidden Markov model) is a statistical model used to describe hidden unknown parameters, for a classic example: a friend of Tokyo every day according to the weather {rain, sunny} decided on the day's activities {Park Walk, shopping, cleaning the room}, I can only see her tweets on Twitter every day "Ah, I took a walk in the park the day before yesterday, and cleaned the room today! "Then I can infer from her tweet that the weather in Tokyo is three days. In this case, the explicit state is the active, hidden state is the weather.

Any hmm can be described by the following five-tuple:

:p Aram Obs: Observation sequence: param states: Hidden state: param start_p: initial probability (hidden state):p Aram trans_p: Transfer probability (hidden state):p Aram Emit_p: Emission probability (probability of implicit state being apparent)

The pseudo code is as follows:

states = (' Rainy ', ' Sunny ') observations = (' Walk ', ' shop ', ' clean ') start_probability = {' Rainy ': 0.6, ' Sunny ': 0.4} Tran sition_probability = {    ' Rainy ': {' Rainy ': 0.7, ' Sunny ': 0.3},    ' Sunny ': {' Rainy ': 0.4, ' Sunny ': 0.6},    Emiss ion_probability = {    ' Rainy ': {' walk ': 0.1, ' Shop ': 0.4, ' clean ': 0.5},    ' Sunny ': {' walk ': 0.6, ' shop ': 0.3, ' Clea N ': 0.1},}

Solving the most probable weather

Solving the most probable hidden state sequence is one of the three typical problems of Hmm, which is usually solved by the Viterbi algorithm. The Viterbi algorithm is the shortest path (-log (prob), or maximum probability) algorithm for Hmm.

A little in Chinese to talk about ideas, it is clear that the first day or rain can be counted out:

Define v[time [today's weather] = probability, note today's weather refers to the day before the weather has been determined (the probability of the most) today's weather is the probability of x, here the probability is a multiplicative probability.
Because the first day my friend went for a walk, so the first day of rain probability v[first day [rain] = initial probability [rain] * emission probability [rain] [Walk] = 0.6 * 0.1 = 0.06, the same can be v[first day [sunny] = 0.24. Intuitively, because the first day friends go out, she generally like to take a walk during the fine weather, so the first day of sunny probability is relatively large, the number and intuition unified.
From the second day onwards, for each weather y, there is the probability of the previous day that the weather is x * x is transferred to y probability * y weather the probability of this activity for a friend. Because the day before the weather x there are two possible, so the probability of y two, select one of the larger one as v[the second day [weather y] probability, while adding today's weather to the result sequence
Compare the probability of v[last day [rain] and [last day] [sunny] to find out which of the larger corresponding sequences is the final result.

The code for the algorithm can be seen on GitHub with the address:

Https://github.com/hankcs/Viterbi

After the run is completed, the results are obtained according to Viterbi:

Sunny Rainy Rainy

Viterbi is widely used in Word segmentation, POS tagging and other application scenarios.

HMM (Hidden Markov model)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HMM (Hidden Markov model)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HMM (Hidden Markov model)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support