1. Definitions of three typical hmm Problems
The commonalities of the three problems are part of the known model. The difference is that the known part and the required part are different.
Evaluation: model parameters are known to calculate the probability of a specific output sequence. Generally, the forward algorithm is used.
Decoding: known model parameters are used to find the sequence that is most likely to generate an implicit state of a specific output sequence. Generally, the Viterbi algorithm is used.
Learning: the output sequence is known to find the most likely state transition and output probability. Generally, the Baum-welch algorithm and the reversed Viterbi algorithm are used.
2. Three Typical Problems of part-of-speech tagging and hmm
In terms of part-of-speech tagging, it is necessary to give the corresponding part-of-speech sequence when word sequences are known. Word sequences are observability, So words are used as observation states and parts of speech as hidden states.
Evaluation: calculate the probability of a specified word sequence based on a given model, which makes no sense in part-of-speech tagging.
Decoding: Based on the given model and word sequence, find the most probable part-of-speech sequence. In part-of-speech tagging, This is the prediction stage.
Learning: Find the most likely hmm parameter (a, B) based on the word sequence and part-of-speech sequence. In part-of-speech tagging, This is a learning stage.
3. Evaluation-forward algorithm-Example
Assume that there are m kinds of weather and N kinds of activities. Hmm parameters are a [m] [m], B [m] [N], and P [M]. The specified observation sequence is int o [Len];
3.1 exhaustion
Calculate the values of each hidden sequence, specify the probability value of the observed sequence, and then sum the values. This is a different hidden sequence of M ^ LEN, which is time-consuming.
3.2 Forward Algorithm
In fact, there are many repeated computations in the exhaustive method. The forward algorithm uses the existing results to reduce repeated computations.
Definition: P [LEN] [M], where P [I] [j] indicates the sequence from day 1 to day I, and the sum of the probabilities of all possible hidden sequences in the I-day hidden state is Sj. The final result is P [LEN-1] [0] +... + P [LEN-1] [M-1], that is, the sum of the probability values of the observed sequence in all hidden states on the last day.
P [0] [J] = P [J] * B [J] [O [0];
// J =, 2,..., M-1, first day calculation, initial probability of the state, multiplied by the probability of the hidden state to the observed state.
P [I] [J] = {P [I-1] [0] * A [0] [I] + P [I-1] [1] * A [1] [I] +... + P [I-1] [M-1] * A [M-1] [I]} * B [J] [O [I];
// I> 1, j = 0, 1, 2 ,..., m-1, calculated after the first day, first from each State of the previous day, to the probability sum of the current state, and then multiply the probability of the hidden state to the observed state.
Complexity: The first day is M multiplication. From the second day, every day is M ^ 2 multiplication, T = O (LEN * M ^ 2 ).
4. decoding-Viterbi algorithm-distance
Assume that there are M kinds of weather and N kinds of activities. HMM parameters are A [M] [M], B [M] [N], and P [M]. The specified observation sequence is int O [LEN];
4.1 exhaustion
First, recall that in the above evaluation algorithm, the actual calculation is the probability and that different hidden sequences of M ^ LEN meet the observed sequence, here, we need to calculate the maximum probability that different hidden sequences of M ^ LEN meet the observed sequence. Similarities: the factors are the same. Differences: The evaluation calculates the sum, and the decoding calculates the maximum value.
4.2 Viterbi Algorithm
Definition: Max [LEN] [M], where P [I] [j] indicates the sequence from day 1 to day I, the maximum probability of all possible hidden sequences in the I-day hidden state of Sj. Path [LEN] [M], where Prev [I] [j] = prev, max [I] [j] = Max [I] [prev] * A [prev] [j];
Max [0] [J] = P [J] * B [J] [O [0];
// J =, 2,..., M-1, first day calculation, initial probability of the state, multiplied by the probability of the hidden state to the observed state.
Path [0] [J] =-1;
// The path above the first day is empty, represented by-1.
Max [I] [J] = max {P [I-1] [k] * A [k] [I]} * B [J] [O [I];
// I> 1, j = 0, 1, 2 ,..., m-1, K =, 2 ,..., m-1, calculated after the first day, first from each State of the previous day, to the maximum probability of the current state, and then multiply the probability of the hidden state to the observed state.
Path [I] [J] = Prev;
// The Prev is the K value corresponding to max {P [I-1] [k] * A [k] [I.
Time Complexity: similar to the forward algorithm, the sum is changed to the maximum value. In addition, a Path array is recorded, which is also O (LEN * M ^ 2)
5. Learning
Learning is the process of obtaining HMM parameters based on several observed sequences and corresponding hidden sequence samples. The common algorithm is the Baum-Welch algorithm, also known as the forward and backward algorithms. This algorithm is a special case of the EM algorithm. First, you need to understand the EM algorithm. Since I still don't know the EM algorithm at half past one, I will write it here first, and then I will separately supplement the EM algorithm and the forward and backward algorithms.
6. Highlights
Forward Algorithm: A known model used to calculate the probability of a specified observed sequence. This requires the sum of probabilities of each hidden sequence.
Viterbi algorithms: known models are used to calculate hidden sequences that specify the maximum probability of observed sequences. A Sequence in each hidden sequence is required.
The calculation process is the same, but one is the sum of the calculated values, and the other is to calculate the maximum value + path.
7. Reference
Forward Agorithm in HMM)
Http://www.suzker.cn/computervision/forward-agorithm-for-hmm.html
Viterbi Agorithm in HMM)
Http://www.suzker.cn/computervision/viterbi-algorithm-for-hmm.html
I love several articles in natural language processing
Http://www.52nlp.cn/category/hidden-markov-model