July Algorithm-December Machine Learning --17th lesson note-Hidden Markov model hmm
July algorithm (julyedu.com) December machine Learning Online class study note http://www.julyedu.com
Hidden Markov model
Three parts: Probability calculation, parameter estimation, model prediction
1,HMM Definition
Hmm is determined by the initial probability distribution π, the probability distribution of state transition A and the observed probability distribution B.
Eg: Using Chinese word segmentation as an example
The Hidden state = "2", is not the terminating word, yes/no? (y/n) is not the last word.
A matrix: The first: The current is the terminating word, the next is the probability of terminating the word
B is the current implicit state is the terminating word, the observed terminating word is what, char
The parameter summary, the implicit state is discrete, B is the n*m matrix, A is the n*n matrix
two basic properties of 1.2 hmm
Homogeneous hypothesis:
Observation Independence hypothesis:
If these two properties are met in practice, it is possible to consider whether to use HMM models
Eg: Pick a box, touch one and put it back
Each parameter of the example
3 Basic problems of 1.3 hmm
1, probability calculation problem: forward-back algorithm -- dynamic programming
2, Learning problem:Baum-welch algorithm (state unknown)--EM
3, prediction problem:Viterbi algorithm -- dynamic programming
1.3.1 Probability calculation problem
Three methods, direct algorithm, forward algorithm, back algorithm
1.3.2 Direct algorithm
The probability of the observed sequence O is summed for all possible state sequences I
Direct algorithm, time complexity is too high, only stay in the theoretical stage
1.3.3 Forward algorithm
The definition of forward probability-forward probability
- Forward algorithm
1, initial value:
2, recursion: for t=1,2 ... T-1
3, the final
The time complexity of the forward algorithm is O (t*)
1.3.4 Post-direction algorithm
1.3.5 front-to-back relationship
At a certain moment,
1.3.6 the probability of a single state
The probability that the given model λ and observed O, at the moment T is in State Qi
The most likely occurrence of the hidden state, take a maximum value
The meaning of gamma:
Not the Viterbi algorithm.
1.3. Combined probabilities of 72 states
For the given model λ and observed O, at the moment T is in the state Qi and at all times t+1 the probability of being in state QJ.
Study on 1.4 hmm
1, if the training data includes observation sequence and state sequence, hmm is supervised learning;
2, if the training data only observation sequence, hmm learning needs to use the EM algorithm, non-supervised learning.
1.4.1 Baum-welch algorithm
is essentially an EM algorithm
- The hypothesis is the current estimate of the Hmm parameter, which is the parameter to be asked.
- EM process
A, the STEP1 of EM first seek the joint probability
The above is divided into 3 parts,
B, step2-expectation maxima portion of EM
1.5 Hmm prediction algorithm
Two algorithms: approximate algorithm, Viterbi algorithm
1.5.1 Viterbi algorithm
The Viterbi algorithm is used to solve the hmm prediction problem by dynamic programming, and the path of maximal probability (optimal path) is obtained by DP, which is a path corresponding to a state sequence.
Define variables: The maximum value of probabilities in all paths where the T status is I.
Summary: In practice, the implicit state of the participle can have multiple, eg: the beginning, the middle, the last one, etc.
July algorithm-December machine learning online Class-17th lesson note-Hidden Markov model hmm