Popular understanding Hidden Markov model HMM (reprint)

Last Update:2016-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Yang Eninala Link: https://www.zhihu.com/question/20962240/answer/33438846 Source: Copyright belongs to the author of all, reproduced please contact the author for authorization. Hidden Markov (HMM) is good to speak, easy to understand and difficult to speak. I don't think there's anything wrong with the answer, but I'd like to say a more understandable example. I hope my reader is not an expert, but a beginner interested in this question, so I will explain more about mathematical thought and less on the formula. Hawking once said that if you write more than one formula, you will have less than half of the readers. So a brief history of time this book on physics is as good as Madonna's book on sex. I will follow this approach and write the most understandable answer.
Or with the most classic example, roll the dice. Let's say I have three different dice in my hand. The first dice are our usual dice (called this dice for D6), 6 faces, each face (1,2,3,4,5,6) The probability of occurrence is 1/6. The second dice is a tetrahedron (called this dice D4), each face (1,2,3,4) The probability of occurrence is 1/4. The third dice has eight faces (called the Dice is D8), each face (1,2,3,4,5,6,7,8) The probability of occurrence is 1/8.
Suppose we start to roll the dice, we first pick from three dice, the probability of picking each dice is 1/3. Then we roll the dice and get a number, one of the 1,2,3,4,5,6,7,8. Repeating the process, we get a bunch of numbers, each of which is one of the 1,2,3,4,5,6,7,8. For example, we might get a bunch of numbers (roll the Dice 10 times): 1 6 3 5 2 7 3 5 2 4
This string of numbers is called the visible state chain. But in the hidden Markov model, we not only have such a chain of visible states, but also a chain of hidden states. In this case, the chain of implied states is the sequence of the dice you use. For example, the implied state chain might be: D6 D8 D8 D6 D4 D8 D6 D6 D4 D8
In general, the Markov chain mentioned in Hmm actually refers to the implied state chain, because there is a conversion probability between the implied state (DICE) (transition probability). In our example, the next state of D6 is that the probability of D4,d6,d8 is 1/3. The next state of D4,d8 is that the conversion probability of D4,D6,D8 is also 1/3. This is set up to be easy to say at first, but we can set the conversion probability at will. For example, we can define, D6 behind D4,D6 is D6 probability is 0.9, is the probability of D8 is 0.1. This is a new hmm.
Similarly, although there is no conversion probability between visible states, there is a probability between the implied state and the visible state called the output probability (emission probability). For our example, six-side dice (D6) produces a 1 output probability of 1/6. The probability of generating 2,3,4,5,6 is also 1/6. We can also make other definitions of the output probabilities. For example, I have a casino to move the hands and feet of the six-sided dice, throw out is 1 probability is greater, is 1/2, throw out is the probability of 2,3,4,5,6 is 1/10.
In fact, for Hmm, it is fairly easy to do simulations if you know in advance the probability of conversion between all implied states and the output probabilities of all implied states to all visible states. But when applying the HMM model, it is often missing part of the information, sometimes you know how many dice, each kind of dice is what, but do not know the dice roll out of the sequence; sometimes you just see the results of a lot of dice, and the rest don't know. If the algorithm is used to estimate the missing information, it becomes a very important problem. I'll talk about these algorithms in more detail below.
Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx If you just want to see an easy-to-understand example, you don't need to look down. Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx said two nonsense, answer the Lord think, to understand an algorithm, to do the following two points: will its meaning, know its shape. The main answer, in fact, is the 1th. But this is precisely the most important, and many books do not say. As you are chasing a girl, the girl says to you, "you have done nothing wrong!" "If you only look at the expression of a girl and think that you have done nothing wrong, you are clearly mistaken." You have to pay attention to what the girl means, "you hasten to apologize to me!" "So when you see the corresponding form of expression, hurriedly admit the wrong, kneel to beg for mercy." Mathematics is the same, if you do not understand the meaning, light to see the formula, often confused. However, the expression of mathematics is at most the obscure point, the expression of the girl, sometimes completely contrary to the original intention. So the Lord has always thought that understanding a girl is much harder than understanding math.
Back to the point, and HMM model-related algorithms are divided into three categories, respectively, to solve three kinds of problems:
1) Know the dice have several (implied state quantity), what each kind of dice is (conversion probability), according to roll the dice to throw the result (visible state chain), I want to know each throw out what kind of dice (hidden state chain). This problem, in the field of speech recognition, is called decoding problem. There are actually two solutions to this problem, and two different answers are given. Every answer is right, but the meaning of these answers is different. The first solution to find the maximum likelihood state path, said the popular point, is I beg a series of dice sequence, this series of dice sequence produces the most probability of observation results. The second solution is not to ask for a set of dice sequence, but to ask each roll of the dice is a certain kind of dice probability. For example, after I see the results, I can find the first time to roll the dice is the probability of D4 is the probability of 0.5,D6 is 0.3,d8 probability is 0.2. The first solution I will say below, but the second solution I will not write here, if everyone is interested, we have another question to continue to write.
2) Still know that there are several dice (implied number of States) , what is each kind of dice (conversion probability) , according to the result of the roll of the dice (visible status chain) , I want to know the probability of throwing this result. It seems that this problem is of little significance, because the result you throw is a lot of time corresponding to a relatively large probability. The purpose of this question is to test whether the observed results are consistent with the known models. If many times the result corresponds to a relatively small probability, then it shows that we know the model is probably wrong, someone secretly put our dice to change.
3) Know that there are several dice (implied number of States) and I don't know what each kind of dice is (conversion probability) , and observed the results of many dice (visible status chain) and I want to counter-launch each kind of dice is what (conversion probability) . This issue is important because this is the most common situation. Many times we have only visible results, we do not know the parameters in the HMM model, we need to estimate these parameters from the visible results, which is a necessary step in modeling.
The problem has been explained, and the solution is explained below. (Issue No. No. 0 is not mentioned above, just as an aid to solve the above problems)
0. A simple question in fact, the practical value of this problem is not high. Because it is helpful to the following difficult questions, I will mention it here first.
Know that there are several dice, each dice is what, each throw is what dice, according to the results of the roll dice, to produce the probability of this result. The solution is simply the multiplication of probabilities:
1. See invisible, crack the dice sequenceHere I'm talking about the first solution, solving the maximum likelihood path problem. For example, I know I have three dice, six-sided dice, four-sided dice, eight sides of a dice. And I know I threw it 10 times. (1 6 3 5 2 7 3 5 2 4), I don't know every time I use that kind of dice, I want to know the most likely dice sequence.
In fact, the simplest and most violent method is to make up all possible dice sequences, and then calculate the probability of each sequence according to the solution of the 0th question. Then we can pick out the sequence of the maximum probability from the inside. If the Markov chain is not long, of course it works. If it is long, the number of poor lifting is too large, it is difficult to complete.
Another well-known algorithm is called Viterbi algorithm. To understand this algorithm, let's look at a few simple examples first.
First, if we roll the dice only once: see the result as 1. The corresponding maximum probability of the dice sequence is D4, because D4 produces 1 of the probability is 1/4, higher than 1/6 and 1/8.
To expand the situation, we roll two dice: the result is 1,6. Then the problem becomes complicated, and we have to calculate three values, namely the maximum probability that the second dice is d6,d4,d8. Obviously, to get the maximum probability, the first dice must be D4. At this point, the maximum probability of the second dice taking the D6 is the same, and we can calculate the maximum probability of the second dice being D4 or D8. We found that the second dice had the greatest probability of taking the D6. and to make this probability maximum, the first dice is D4. So the maximum probability of the dice sequence is D4 D6.
Continue to expand, we throw three times dice: Similarly, we calculate the third dice is the maximum probability of d6,d4,d8. Again, we find that the second dice must be D6 to get the maximum probability. At this point, the third dice take to D4 the maximum probability is ibid, we can calculate the third dice is D6 or D8 when the maximum probability. We found that the third dice had the greatest probability of taking the D4. To make this probability maximum, the second dice is D6, the first dice is D4. So the maximum probability of the dice sequence is D4 D6 D4.
Write here, you should see the rules. Since roll the dice 123 times can count, toss how many times can be and so on. We found that we asked for the maximum probability of the dice sequence when doing so few things. First, regardless of how long the sequence is, to calculate the length of the sequence from 1, the maximum probability of each dice is taken from 1 o'clock. Then, gradually increase the length, each additional length, re-calculate the last position at this length to take the maximum probability of each dice. Because the maximum probability of each dice is calculated from the last length, it is not difficult to recalculate. When we count to the last one, we know which one of the last dice is the most probable. Then, we are going to push the sequence that corresponds to this maximum probability from the forward.
2. Who Moved my dice? For example, you suspect that your six-side sic has been moved by the casino, it is possible to be replaced by another six-side dice, the six-side dice roll out is 1 of the probability is greater, 1/2, the probability of throwing out is 2,3,4,5,6 1/10. What do you do? The answer is simple, calculate a normal three dice throw a sequence of probability, and then calculate an abnormal six-sided dice and the other two normal dice throw this sequence probability. If the former is smaller than the latter, you must be careful.
For example, the result of the dice is: To calculate the normal three dice to throw the result of the probability, in fact, the probability of all the possible conditions are added and calculated. Similarly, the simple and violent method is to make all the dice sequence, or calculate the probability of each dice sequence corresponding, but this time, we do not pick the maximum, but the sum of all the calculated probabilities, the total probability is the result of our request. This method still cannot be applied to too long dice sequences (Markov chains).
We will apply a solution that is similar to the previous one, except that the previous one is concerned with the probability of the maximum, and this problem concerns the sum of probabilities. The algorithm to solve this problem is called forward algorithm (forward algorithm).
First, if we roll the dice only once: see the result as 1. The total probability of generating this result can be calculated as follows, the total probability is 0.18: to expand the situation, we throw two times the dice: see the result as 1,6. The total probability of generating this result can be calculated as follows, with a total probability of 0.05:
Continue to expand, we throw three times the dice: see the result is 1,6,3. The total probability of generating this result can be calculated as follows, with a total probability of 0.03:
Similarly, we take a step by step calculation, how long, how long, and then the long Markov chain can always be counted out. In the same way, you can also figure out the abnormal six-sided dice and the other two normal die throw this sequence of probability, and then we compare the two probability size, we can know that your dice is not changed by people.
3. Throw a bunch of dice out and let me guess who you are (The Lord is very lazy, has not written, will write the EM this called algorithm method)The above algorithm, in fact, the use of recursion, reverse derivation, loop these methods, I just write in a very straightforward language. If you go to a professional book, you will find a more rigorous and professional description. After all, I only did it, to know its shape, or to read.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Popular understanding Hidden Markov model HMM (reprint)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Popular understanding Hidden Markov model HMM (reprint)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support