[Math] Hidden Markov Model

Source: Internet
Author: User

Links: https://www.zhihu.com/question/20962240/answer/33438846

Hawking once said that if you write more than one formula, you will have less than half of the readers.

Or with the most classic example, roll the dice. Let's say I have three different dice in my hand.
<img src= "https:// Pic4.zhimg.com/435fb8d2d675dc0be95aedf27feb6b67_b.jpg "data-rawwidth=" 1351 "data-rawheight=" 825 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1351 "data-original=" https://pic4.zhimg.com/435fb8d2d675dc0be95aedf27feb6b67_ R.jpg ">
Suppose we start to roll the dice, we first pick from three dice, the probability of picking each dice is 1/3. Then we roll the dice and get a number, one of the 1,2,3,4,5,6,7,8. Repeating the process, we get a bunch of numbers, each of which is one of the 1,2,3,4,5,6,7,8. For example, we might get a bunch of numbers (roll the Dice 10 times): 1 6 3 5 2 7 3 5 2 4

This number is called Visible Status Chain。 But in the hidden Markov model, we have not only such a chain of visible states, but also a string of implied state chain。 In this case, the chain of implied states is the sequence of the dice you use. For example, the implied state chain might be: D6 D8 D8 D6 D4 D8 D6 D6 D4 D8 There is a conversion probability between the implied state (DICE) ( transition probability)。 In our example, the next state of D6 is that the probability of D4,d6,d8 is 1/3. The next state of D4,d8 is that the conversion probability of D4,D6,D8 is also 1/3. This is set up to be easy to say at first, but we can set the conversion probability at will. For example, we can define, D6 behind D4,D6 is D6 probability is 0.9, is the probability of D8 is 0.1. This is a new hmm.

The same, Although there is no conversion probability between visible states, but there is a probability between the implied state and the visible state called output probability (emission probability)。 For our example, six-side dice (D6) produces a 1 output probability of 1/6. The probability of generating 2,3,4,5,6 is also 1/6. We can also make other definitions of the output probabilities. For example, I have a casino to move the hands and feet of the six-sided dice, throw out is 1 probability is greater, is 1/2, throw out is the probability of 2,3,4,5,6 is 1/10.
<img src= "https:// Pic1.zhimg.com/95b60935725125a126e02e370c595000_b.jpg "data-rawwidth=" 1508 "data-rawheight=" 781 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1508 "data-original=" https://pic1.zhimg.com/95b60935725125a126e02e370c595000_ R.jpg ">
<img src= "https:// Pic2.zhimg.com/53193f484ae89279da5a717a9d756089_b.jpg "data-rawwidth=" 1384 "data-rawheight=" 731 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1384 "data-original=" https://pic2.zhimg.com/53193f484ae89279da5a717a9d756089_ R.jpg ">
In fact, for Hmm, it is fairly easy to do simulations if you know in advance the probability of conversion between all implied states and the output probabilities of all implied states to all visible states. But when applying the HMM model, it is often missing part of the information, sometimes you know how many dice, each kind of dice is what, but do not know the dice roll out of the sequence; sometimes you just see the results of a lot of dice, and the rest don't know. If the algorithm is used to estimate the missing information, it becomes a very important problem. I'll talk about these algorithms in more detail below.



Back to the chase, and HMM model related AlgorithmsMainly divided into three categories, respectively, to solve three kinds of problems:

1) Know the dice have several (implied state quantity), what each kind of dice is (conversion probability), according to roll the dice to throw the result (visible state chain), I want to know each throw out what kind of dice (hidden state chain).
This question, in the field of speech recognition, is called Decoding Problems。 There are actually two solutions to this problem, and two different answers are given. Every answer is right, but the meaning of these answers is different.
    • The first solution to find the maximum likelihood state path, said the popular point, is I beg a series of dice sequence, this series of dice sequence produces the most probability of observation results. (This is the only one here)
    • The second solution is not to ask for a set of dice sequence, but to ask each roll of the dice is a certain kind of dice probability. For example, after I see the results, I can find the first time to roll the dice is the probability of D4 is the probability of 0.5,D6 is 0.3,d8 probability is 0.2. The first solution I will say below, but the second solution I will not write here, if everyone is interested, we have another question to continue to write.

2) Still know that there are several dice (implied number of States) , what is each kind of dice (conversion probability) , according to the result of the roll of the dice (visible status chain) , I want to know the probability of throwing this result.
It seems that this problem is of little significance, because the result you throw is a lot of time corresponding to a relatively large probability. The purpose of this question is to test whether the observed results are consistent with the known models. If many times the result corresponds to a relatively small probability, then it shows that we know the model is probably wrong, someone secretly put our dice to change.

3) Know that there are several dice (implied number of States) and I don't know what each kind of dice is (conversion probability) , and observed the results of many dice (visible status chain) and I want to counter-launch each kind of dice is what (conversion probability) .
This issue is important because this is the most common situation. Many times we have only visible results, we do not know the parameters in the HMM model, we need to estimate these parameters from the visible results, which is a necessary step in modeling.
The problem has been explained, and the solution is explained below. (Issue No. No. 0 is not mentioned above, just as an aid to solve the above problems)

0. A simple question
In fact, the practical value of this problem is not high. Because it is helpful to the following difficult questions, I will mention it here first.
"Know the dice there are several, what each kind of dice, each throw is what dice, according to the result of throwing dice, to produce the probability of this result." ”
<img src= "https:// Pic1.zhimg.com/2ca5e20b49d2ad17963b477a5691a9e0_b.jpg "data-rawwidth=" 364 "data-rawheight=" 237 "class=" Content_ Image "Width=" 364 "> solution is nothing more than probability multiplication: The solution is simply the multiplication of probabilities:


1. See invisible, crack the dice sequenceHere I'm talking about the first solution, solving the maximum likelihood path problem.
For example, I know I have three dice, six-sided dice, four-sided dice, eight sides of a dice. And I know I threw it 10 times. (1 6 3 5 2 7 3 5 2 4), I don't know every time I use that kind of dice, I want to know the most likely dice sequence.

In fact, the simplest and most violent way is to be poor in all possible dice sequences, and thenaccording to the solution of the 0th questionCalculates the probability of each sequence corresponding to it. Then we can pick out the sequence of the maximum probability from the inside. If the Markov chain is not long, of course it works. If it is long, the number of poor lifting is too large, it is difficult to complete.

Another well-known algorithm is called Viterbi Algorithm. To understand this algorithm, let's look at a few simple examples first.

(1) First, if we roll the dice only once:
<img src= "https:// Pic4.zhimg.com/cd4ede10233a8b9c33cd3921ac64bfeb_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic4.zhimg.com/cd4ede10233a8b9c33cd3921ac64bfeb_ R.jpg ">   
Because the probability of D4 producing 1 is 1/4, above 1/6 and 1/8,so, the corresponding maximum probability dice sequence is D4,

(2) To expand the situation, we roll two dice:
<img src= "https:// Pic1.zhimg.com/6790ea73a601549e1f2a8dae1abcde44_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic1.zhimg.com/6790ea73a601549e1f2a8dae1abcde44_ R.jpg ">   
The result is 1,6. At this point the problem becomes complicated, and we have to calculate three values, namely the maximum probability that the second dice is the D6,d4,d8. Obviously, to get the maximum probability, the first dice must be D4. The second dice take D6:
Similarly, we can calculate the maximum probability of a second die when it is D4 or D8. We found that the second dice had the greatest probability of taking the D6.
(3) Continue to expand, we throw three times dice:
<img src= "https:// Pic4.zhimg.com/82093763ebb5f0b84784206bca544063_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic4.zhimg.com/82093763ebb5f0b84784206bca544063_ R.jpg "> Again, we calculate the maximum probability that the third dice are d6,d4,d8 respectively. Again, we find that the second dice must be D6 to get the maximum probability. At this point, the maximum probability of the third dice taking to D4 is:
As above, we can calculate the maximum probability of a third die when it is D6 or D8. We found that the third dice had the greatest probability of taking the D4. To make this probability maximum, the second dice is D6, the first dice is D4. So the maximum probability of the dice sequence is D4 D6 D4.

Write here, you should see the rules. Since roll the dice 123 times can count, toss how many times can be and so on. We found that we asked for the maximum probability of the dice sequence when doing so few things. First, regardless of how long the sequence is, to calculate the length of the sequence from 1, the maximum probability of each dice is taken from 1 o'clock. Then, gradually increase the length, each additional length, re-calculate the last position at this length to take the maximum probability of each dice. Because the maximum probability of each dice is calculated from the last length, it is not difficult to recalculate. When we count to the last one, we know which one of the last dice is the most probable. Then we have to put the corresponding this sequence of maximal probabilities is pushed forward from the rear.
2. Who Moved my dice?
For example, you suspect that your six-side sic has been moved by the casino, it is possible to be replaced by another six-side dice, the six-side dice roll out is 1 of the probability is greater, 1/2, the probability of throwing out is 2,3,4,5,6 1/10. What do you do? The answer is simple, it's normal. three x DiceThrow a sequence of probabilities, and then calculateAbnormal six-side diceAndThe other twoThe probability of the normal dice throwing this sequence. If the former is smaller than the latter, you must be careful.

For example, the result of the dice is:
<img src= "https:// Pic4.zhimg.com/82093763ebb5f0b84784206bca544063_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic4.zhimg.com/82093763ebb5f0b84784206bca544063_ R.jpg ">
To calculate the probability of throwing the result with a normal three dice is to add and calculate the probability of all possible cases. Similarly, the simple and violent method is to make all the dice sequence, or calculate the probability of each dice sequence corresponding, but this time, we do not pick the maximum, but the sum of all the calculated probabilities, the total probability is the result of our request. This method still cannot be applied to too long dice sequences (Markov chains).

We will apply a solution similar to the previous one, buta previous questionConcern is thatProbability Maximum value, the question is concernedThe sum of probabilities。 The algorithm for solving this problem is calledforward algorithm (forward algorithm)。

First, if we roll the dice only once:
<img src= "https:// Pic4.zhimg.com/cd4ede10233a8b9c33cd3921ac64bfeb_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic4.zhimg.com/cd4ede10233a8b9c33cd3921ac64bfeb_ R.jpg ">
See the result as 1. The total probability of generating this result can be calculated as follows, with a total probability of 0.18:
<img src= "https:// Pic1.zhimg.com/9b76649fefc177911313c03169502614_b.jpg "data-rawwidth=" 1496 "data-rawheight=" "class=" Origin_ Image Zh-lightbox-thumb "width=" 1496 "data-original=" https://pic1.zhimg.com/9b76649fefc177911313c03169502614_ R.jpg ">
To expand the situation, we roll two dice:
<img src= "https:// Pic1.zhimg.com/6790ea73a601549e1f2a8dae1abcde44_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic1.zhimg.com/6790ea73a601549e1f2a8dae1abcde44_ R.jpg ">
See the result as 1,6. The total probability of generating this result can be calculated as follows, with a total probability of 0.05:
<img src= "https:// Pic2.zhimg.com/a2024b20c5036e932a53753acdf67085_b.jpg "data-rawwidth=" 1496 "data-rawheight=" "class=" Origin_ Image Zh-lightbox-thumb "width=" 1496 "data-original=" https://pic2.zhimg.com/a2024b20c5036e932a53753acdf67085_ R.jpg ">

Continue to expand, we throw three times dice:
<img src= "https:// Pic4.zhimg.com/82093763ebb5f0b84784206bca544063_b.jpg "data-rawwidth=" 1477 "data-rawheight=" 275 "class=" Origin_ Image Zh-lightbox-thumb "width=" 1477 "data-original=" https://pic4.zhimg.com/82093763ebb5f0b84784206bca544063_ R.jpg "> See the result as 1,6,3. The total probability of generating this result can be calculated as follows, with a total probability of 0.03:

Similarly, we take a step by step calculation, how long, how long, and then the long Markov chain can always be counted out. In the same way, you can also figure out the abnormal six-sided dice and the other two normal die throw this sequence of probability, and then we compare the two probability size, we can know that your dice is not changed by people. 3. Throw a bunch of dice out and let me guess who you are
(The Lord is very lazy, has not written, will write the EM this called algorithm method)
Links: https://www.zhihu.com/question/20962240/answer/33561657

Is an example, this time Uncle switch the possibility of the dice is described vividly.

<img src= "Https://pic3.zhimg.com/de1e09aa9c09b1a0928f5b91ba45d352_b.jpg" Data-rawwidth= "data-rawheight=" class= "Origin_image zh-lightbox-thumb" width= "data-original=" https:// Pic3.zhimg.com/de1e09aa9c09b1a0928f5b91ba45d352_r.jpg ">

teleportation probability : probability transfer graph, where A_{ij} represents the probability of occurrence from I state to J state

<img src= "Https://pic1.zhimg.com/0a28bf2d267ba6aa16dde74771796bbc_b.jpg" Data-rawwidth= "183" data-rawheight= "class=" Content_image "width=" 183 ">

The recessive state shows the transition probability . That is, the probability distribution of each point of the dice, (e.g. cheat dice 1 can have 90% chance to throw six, cheat Dice 2 have 85% chance to throw to ' small '). Give a picture as follows,

<img src= "Https://pic2.zhimg.com/2f6e49ac23b9f99cc21cf9016435ae39_b.jpg" Data-rawwidth= "data-rawheight=" 245 "class=" Origin_image zh-lightbox-thumb "width=" data-original= "https://" Pic2.zhimg.com/2f6e49ac23b9f99cc21cf9016435ae39_r.jpg ">

The probability of recessive state representation can also be expressed by a matrix.

<img src= "Https://pic3.zhimg.com/9974153a9ac8963c329c4195878c5312_b.jpg" Data-rawwidth= "325" data-rawheight= "class=" Content_image "width=" 325 ">

Summing up these two things is the whole HMM model.


For example, we want to know what the probability of this uncle throwing 10 6 in a row in the next game? As follows

<img src= "Https://pic3.zhimg.com/f893ed100b894616bffef0bae9f461da_b.jpg" Data-rawwidth= "421" data-rawheight= "class=" Origin_image zh-lightbox-thumb "width=" 421 "data-original=" https:// Pic3.zhimg.com/f893ed100b894616bffef0bae9f461da_r.jpg ">

First Assumptions a recessive sequence of states , assuming that the first 5 of the uncle with the normal dice, after 5 with the Cheat dice 1.

<img src= "Https://pic3.zhimg.com/468da543a29861571622d107d845bc42_b.jpg" Data-rawwidth= "data-rawheight=" class= "Content_image" width= ">"

Well, then we can calculate the probability of throwing 10 6 in this implicit sequence hypothesis.

<img src= "Https://pic1.zhimg.com/4f15b94f31eb9fbe8370a1ad6b90e730_b.jpg" Data-rawwidth= "236" data-rawheight= "WU" class= "Content_image" width= "236" >
This probability is actually, the product of the probability B of the recessive state representation .
<img src= "Https://pic4.zhimg.com/c1616bab1b90e3daa3b01c93a27f5c07_b.jpg" Data-rawwidth= "339" data-rawheight= "WU" class= "Content_image" width= "339" >

But the problem arises again, the recessive sequence of states that I assumed , and the actual sequence I don't know, what to do. OK, all the possible combinations of hidden sequences can be tried once.

Links: https://www.zhihu.com/question/20962240/answer/64187492

Question 1, the whole model is known, my girlfriend told me that for three days in a row, the things she did after work were: walking, shopping, tidying up. So, according to the model, the probability of generating these behaviors is calculated. Question 2, also know this model, also is these three things, my girlfriend asked me to guess, this three days after work, what's the weather like in Beijing. This three days what kind of weather is most likely to let her do such things. Question 3, the most complicated of all, my girlfriend only told me these three days she did these three things, and other information I did not. She wants me to build a model, rain. Probability distribution of the first day weather conditions, according to the weather conditions she chooses to do something about the probability distribution. Horrific

To solve these problems, the great masters have found the corresponding algorithms respectively.
    • Problem one, Forward algorithm, forward algorithm, or backward Algo, backward algorithm.
    • Problem two, Viterbi Algo, Viterbi algorithm.
    • Problem three, Baum-welch Algo, Baum-Welch algorithm.



"Sol 1" in question 1: Traversal algorithm (exhaustive).

To calculate the probability of generating this series of behaviors, we list the behavior of each kind of weather situation, and that is the probability of each case. There are two possible weather conditions for three days each day, and there is a total of a situation.
One example: P (Rain, Rain, rain, walking, shopping, cleaning) =p (First World Rain) p (First World rain go for a walk) p (the next day It Rains) p (the third Day also rains) p (rain home to clean) =0.6x0.1x0.7x0.4x0.7x0.5= 0.00588//IS: the probability that a series of behaviors occur simultaneously
Of course, the P in this (then rain next day) is of course known as the First World rain, the probability of rain the next day is 0.7.
Add up to eight cases, three days of the behavior is {walk, shopping, clean up} The likelihood is 0.033612. Seemingly simple and easy to calculate, but once the observation sequence becomes longer, the computational amount is very large (the complexity, T is the length of the observed sequence).

"Sol2" in question 1: forward algorithm.

The probability of a "walk" behavior is calculated first, and if it rains, it is: t=1 P (Stroll, rain) =p (First World Rain) X P (Stroll | rain) =0.6x0.1=0.06; a sunny day. P (Stroll, sunny Day) =0.4x0.6=0.24

T=2 moment, the probability of "shopping" occurs, of course, this probability can be calculated from the t=1 moment. As follows:

If t=2 rains, then

P (First day walk, next day shopping, Second world rain)

= "p (first day of walk, first day of rain ) X P (Rain next day | first day of Rain) +p (First day walk, Sunny day ) X P (Rain next day | first day on sunny days ) "X P (Next day Shopping | rain next day )

="0.06x0.7+0.24x0.3" X0.4

=0.0552

If the t=2 is sunny,

P (first day stroll, shopping next day, sunny next day)

=0.0486 (the same can be done, please self-reasoning)

If t=3, it rains, then

P (First day walk, next day shopping, third day pack, third day rain)

= "P (first day walk, next day shopping, Second world rain) X P (Third day Rain | Second world Rain) + p (first day stroll, shopping next day, sunny next day) X p (Rain on the third day | The next day is Sunny) "X P (Third day pack | Third World Rain)

= "0.0552x0.7+0.0486x0.4" X0.5

= 0.02904

If t=3, sunny day, then

P (First day walk, next day shopping, third day clean up, third day Sunny)

= 0.004572

So P (the first day of the walk, the next day shopping, the third day to pack), this probability is the third day, rain and sunny two cases of probability and.

0.02904+0.004572=0.033612.

The above example shows that the forward algorithm calculates each time point, the probability of the occurrence of the observed sequence of each state, seemingly complex, but when the T becomes large, the complexity will be greatly reduced.

"Sol3" in question 1: Backward algorithm

As the name implies, the forward algorithm is in time t=1, step by step forward calculation. Conversely, the backward algorithm is backwards, starting from the last state and slowly pushing backwards.

Initialize: (The first use of the knowledge of the formula editor, it is pretty reliable)

Recursion:

=0,.7x0.5x1+0.3x0.1x1=0.38

The first is the transfer probability, the second world rain to the third day the probability of rain is 0.7, the second is the observation probability, the third day of rain, the probability of home cleaning is 0.5; the third is our defined backward variable (backward variable).

In the same vein.

End: P (stroll, shop, pack) ==0.6x0.1x0.1298+0.4x0.6x0.1076

=0.033612

<img src= "Https://pic3.zhimg.com/1a89bf925b4c1af2cc17416764d1d60e_b.png" data-rawwidth= " 340 "data-rawheight=" 295 "class=" Content_image "width=" 340 ">

The answers to the three algorithms are consistent.

Solution to Issue 2: Viterbi algorithm

The Viterbi algorithm is dedicated to finding an optimal path so that the observed sequence can best be explained.

Initialize :

Initial path:

recursion , of course, is to find the probability of the more large path.

Well, the best way to get to the next rainy day is:

In other words, the first day is more likely to be sunny.

In the same way, it can be pushed,

End: The size of the comparison, found the former larger, then the last day of the state is most likely to be rainy.

back: According to know, arrived the third day rain this state, most likely is the next day also rain, then according to know, arrive the next day rain this state, most likely is the first day is sunny.

As a result, we get the best path, that is, sunny, rainy, rainy days.

Resolution of Issue 3: Baum-welch algorithm.

The complexity of this problem is much higher than the first two algorithms, not a simple explanation can be said clearly. If you are interested, you can have a private messages.

I very much agree with Hawking old man's "more than one formula, less than 10 readers", but he wrote, but found that in English these formulas seem to be more concise and understandable than Chinese, Chinese seems to be more Luo Li bar some.

Still with a very grateful heart, thank you again for this question and the enthusiastic people who answered the question brought me help.



[Math] Hidden Markov Model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.