Python implementation hmm (hidden Markov model)

Source: Internet
Author: User

1. Preface

Hidden Markov hmm model is a kind of important machine learning method, which is mainly used in the analysis of sequence data, widely used in speech recognition, text translation, sequence prediction, Chinese word segmentation and other fields. Although in recent years, the HMM model has become less popular due to the development of deep learning methods such as RNN, but it does not mean completely exiting the application area, even in some lightweight tasks. This series of blogs will be a detailed analysis of the hidden Markov chain hmm model, with most of the previous network of tutorials, this series of blogs will be more in-depth analysis of hmm, not only the estimation of the sequence of the hidden state of the Viterbi algorithm (hmm decoding problem), the forward backward algorithm, but also focus on the analysis of HMM's EM training process, And all the processes are deduced by mathematical formulas.

Because the hidden Markov hmm model is a kind of very complex model, which contains a lot of mathematical knowledge of probability statistics, so most of the blogs on the network generally use examples and other popular ways to introduce Hmm, so that beginners will soon understand the principle of hmm, but to lose a lot of detail, Let the beginner in a indefinitely state. This paper does not consider using very popular words to describe Hmm, or to consider the detailed mathematical formula to guide beginners to master the idea of Hmm. In addition, this paper focuses on the EM training process of Hmm, which is not available in other tutorials on the network, while the individual thinks that the EM training process of hmm is more complicated than the Viterbi algorithm and the forward backward algorithm, but once the training process is mastered, the derivation of the general chain graph structure, The EM algorithm and the understanding of network training will be very helpful. In addition, by summing up the mathematical principle of Hmm, it is also very convenient to rewrite the mathematical formula into code.

Finally, this paper provides a simple version of the hidden Markov chain hmm python code, including the Gaussian model of hmm and discrete hmm two cases, the code contains the HMM training, prediction, decoding all the process, the core code in total only 200~300 line code, very simple! Personal code level comparison Slag =_=| |, according to my tutorial, you should all be able to write more robust and efficient code, attached to GitHub address: Https://github.com/tostq/Easy_HMM

Feel good, just point star Oh!

Feel good, just point star Oh!

Feel good, just point star Oh!


Important things to say three times!!!!

To facilitate learning, I refine the entire HMM code into an entire learning project, which includes

Hmm.py:HMM Core code contains a HMM base class, a Gaussian hmm model and a discrete HMM model.

discretehmm_test.py and gaussianhmm_test.py: using Unnitest to test our Hmm, we introduced a classical HMM library Hmmlearn as a control group.

dice_01.py: Using discrete hmm models to solve the problem of lost dice

wordseg_02.py: An example of solving Chinese word segmentation problem

stock_03.py: An example of solving stock forecasting problems

2. Background of hidden Markov chain hmm

First, a set of sequences is known:

We derive the function that produces this set of sequences from this set of sequences, assuming that the function parameter is, which is expressed as

That is, to make the sequence X the most probability of the function parameters , to solve the above-described, the simplest consideration is that the sequence of each data is considered independent, such as the establishment of a neural network. This consideration then increases as the sequence grows, leading to an explosion of parameters. So you can assume that the current sequence data is only related to its previous data value, known as the first-order Markov chain :

There is a first order Markov chain, there will also be Ishimarkov chain (that is, the current data value depends on its first two data values)

At present, this article does not do in-depth analysis of the Ishimarkov chain, focusing on the first-order Markov chain, now according to the first-order Markov chain hypothesis, we have:

Therefore, in order to solve the first order Markov chain, the key is to find the conversion function between the data (the observed values), if the conversion function is assumed to be independent of the position (time) in the sequence , we can find the probability of the whole sequence according to the conversion function:

However, if the state of the observed X is very large (especially in the case of continuous data), the conversion function becomes a very big matrix, and if the state of X has K, then the conversion function will be a k* (K-1) parameter, and it is difficult for the continuous variable observation values.

In order to reduce the parameter number of the conversion function of Markov chain, we introduce a hidden state value with less state, and convert the observed Markov chain to the hidden Markov chain (that is, the hidden Markov chain hmm).

It contains an important hypothesis: The current observations are determined only by the current hidden state . An important benefit of this is that the state of the hidden state value is much smaller than the state of the observed value, so there are fewer parameters for the transform function that hides the state.

The question we are going to decide at this point is:

That is , in the case of all possible hidden state sequences, the function parameters that make the most probability of the sequence occurring are obtained .

Here we summarize again:

Three important assumptions of hidden Markov chain hmm:

1. The current observation value is determined only by the current hidden state, regardless of other hidden or observed values (hidden state hypothesis)

2. The current hidden state is determined by its previous hidden state (first-order Markov hypothesis)

3. The probability of conversion function between hidden states does not change with time (conversion function stability hypothesis)

The problems to be solved by hidden Markov chain hmm:

In the case of all possible hidden state sequences, the function parameter θ that makes the current sequence x produce the most probability is obtained.

Code

http://blog.csdn.net/sinat_36005594/article/details/69568538

A few days ago with the use of MATLAB to implement the HMM code, this time with Python wrote it again, according to Dr. Hangyuan Li's "Statistical learning method"

Because for the first time with Python, the code may have many flaws, but all the code is tested with the examples in the book and the results are correct.

Here I would like to say Python, in the writing of the HMM process to see the previous written MATLAB program, found that they have too many similarities, the use of the NumPy library, in the process of Python code is the most troubling me is the array angle, and the MATLAB matrix angle standard from 1 different, The NumPy library array is starting from 0, and the dimensions of the array need to be cautious, and the too many indices for array errors occur accidentally. In the end of the program was the Viterbi algorithm, which appeared during the run __main__:190:visibledeprecationwarning:using a non-integer number instead of an integer would resul The warning of the future of the T in an error, which has not been removed, has been checked to say that it does not affect the result, and that the following will fix the problem, with my code posted below

#-*-coding:utf-8-*-"""Created on Thu Feb 19:28:39 20172017-4-2 forwardbackwardalg function function: To achieve the theoretical basis of forward algorithm: Hangyuan li "Statistical learning method" 2017-4-5 modified Forwardb The ACKWARDALG function name is Forwardalgo and the alpha array of the output completes the function of the Backwardalgo function: the back algorithm and the function Fbalgoappli: The calculation in the case of the observation sequence and the model parameter determination, a certain implied state corresponds to the corresponding The probability of the observed state 2017-4-6 completion of the BAUMWELCHALGO function one iteration 2017-4-7 implementation of the Viterbi algorithm @author:sgp"""ImportNumPy as NP#The input format is as follows:#A = Np.array ([[. 5,.2,.3],[.3,.5,.2],[.2,.3,.5]])#B = Np.array ([[. 5,.5],[.4,.6],[.7,.3]])#Pi = Np.array ([[. 2,.4,.4]])#O = Np.array ([[1,2,1]])#when applying ndarray to each other in an array, make sure that the array has the same number of dimensions! #For example:#in[93]:m = Np.array ([1,2,3,4])#in[94]:m#out[94]: Array ([1, 2, 3, 4])#In[95]:m.shape#out[95]: (4,)#this represents a one-dimensional array#in[96]:m = Np.array ([[1,2,3,4]])#in[97]:m#out[97]: Array ([[[1, 2, 3, 4]])#In[98]:m.shape#out[98]: (1, 4)#and here is the two-dimensional array.#Note the difference between in[93] and in[96], more pair of brackets!! #N = a.shape[0] Number of rows for array A, H = o.shape[1] Number of columns for array O#In each of the following functions, both alpha and beta arrays are n*h two-dimensional arrays, that is, horizontal coordinates are time, portrait is statedefForwardalgo (a,b,pi,o): N= A.shape[0]#number of rows in array aM = a.shape[1]#number of columns in array aH = o.shape[1]#number of columns in array osum_alpha_1=Np.zeros ((m,n)) Alpha=Np.zeros ((n,h)) R= Np.zeros ((1, N)) Alpha_1= Np.multiply (pi[0,:], b[:,o[0,0]-1]) alpha[:,0]= Np.array (alpha_1). Reshape (1,n)#Alpha_1 is a one-dimensional array that needs to be upgraded to a two-dimensional array when using np.multiply. #错误是IndexError: Too many indices for array     forHinchRange (1, H): forIinchRange (N): forJinchRange (M): Sum_alpha_1[i,j]= alpha[j,h-1] *A[j,i] R= Sum_alpha_1.sum (1). Reshape (1,n)#in the same vein, upgrade the array to a two-dimensional arrayALPHA[I,H] = r[0,i] * b[i,o[0,h]-1]    #print ("Alpha matrix: \ n%r"% Alpha)p = alpha.sum (0). Reshape (1, H) P= P[0,h-1]    #print ("observed probability: \ n%r"% P)    #return Alpha    returnAlpha, PdefBackwardalgo (a,b,pi,o): N= A.shape[0]#number of rows in array aM = a.shape[1]#number of columns in array aH = o.shape[1]#number of columns in array o        #beta = Np.zeros ((n,h))Sum_beta = Np.zeros ((1, N)) Beta=Np.zeros ((n,h)) Beta[:,h-1] = 1P_beta= Np.zeros ((1, N))  forHinchRange (h-1,0,-1):         forIinchRange (N): forJinchRange (M): Sum_beta[0,j]= a[i,j] * b[j,o[0,h]-1] *Beta[j,h] Beta[i,h-1] = sum_beta.sum (1)    #print ("Beta matrix: \ n%r"% beta)     forIinchRange (N): P_beta[0,i]= pi[0,i] * b[i,o[0,0]-1] *beta[i,0] P= P_beta.sum (1). Reshape ()    #print ("observed probability: \ n%r"% p[0,0])    returnBeta, p[0,0]deffbalgoappli (a,b,pi,o,i):#The probability of an implied state corresponding to the observed state is calculated in the case of the observation sequence and the model parameter determination.    #example Reference Hangyuan Li "Statistical learning method" P189 Exercise 10.2    #Input Format:    #I is a two-dimensional array that holds the angular mark T and I of it and QI in the probability p (it = qi,o|lambda), i.e. P=[t,i]ALPHA,P1 =Forwardalgo (a,b,pi,o) beta,p2=Backwardalgo (a,b,pi,o) p= alpha[i[0,1]-1,i[0,0]-1] * Beta[i[0,1]-1,i[0,0]-1]/P1returnPdefGetgamma (a,b,pi,o): N= A.shape[0]#number of rows in array aH = o.shape[1]#number of columns in array oGamma =Np.zeros ((n,h)) Alpha,p1=Forwardalgo (a,b,pi,o) beta,p2=Backwardalgo (a,b,pi,o) forHinchRange (H): forIinchRange (N): Gamma[i,h]= alpha[i,h] * Beta[i,h]/P1returnGammadefGetxi (a,b,pi,o): N= A.shape[0]#number of rows in array aM = a.shape[1]#number of columns in array aH = o.shape[1]#number of columns in array oXi = Np.zeros ((H-1, N,m)) ALPHA,P1=Forwardalgo (a,b,pi,o) beta,p2=Backwardalgo (a,b,pi,o) forHinchRange (H-1):         forIinchRange (N): forJinchRange (M): Xi[h,i,j]= alpha[i,h] * a[i,j] * b[j,o[0,h+1]-1] * beta[j,h+1]/P1#print ("Xi matrix: \ n%r"% xi)    returnXidefBaumwelchalgo (a,b,pi,o): N= A.shape[0]#number of rows in array aM = a.shape[1]#number of columns in array aY = b.shape[1]#number of columns in array bH = o.shape[1]#number of columns in array oc =0 Gamma=Getgamma (a,b,pi,o) Xi=Getxi (a,b,pi,o) xi_1=xi.sum (0) a=Np.zeros ((n,m)) b=Np.zeros ((m,y)) Pi= Np.zeros ((1, N)) A_1= Np.subtract (Gamma.sum (1), gamma[:,h-1]). Reshape (1, N) forIinchRange (N): forJinchRange (M): A[i,j]= Xi_1[i,j]/A_1[0,i]#print (a)     forYinchRange (Y): forJinchRange (M): forHinchRange (H):ifO[0,h]-1 = =Y:c= C +Gamma[j,h] Gamma= Gamma.sum (1). Reshape (1, N) b[j,y]= c/Gamma[0,j] C=0#print (b)     forIinchRange (N): Pi[0,i]=gamma[i,0]#print (PI)    returnA,b,pidefBaumwelchalgo_n (a,b,pi,o,n):#Baumwelch algorithm for calculating the number of iterations n     forIinchrange (N): A,b,pi=Baumwelchalgo (a,b,pi,o)returnA,b,pidefViterbi (a,b,pi,o): N= A.shape[0]#number of rows in array aM = a.shape[1]#number of columns in array aH = o.shape[1]#number of columns in array oDelta =Np.zeros ((m,h)) Psi=Np.zeros ((m,h)) delta_1= Np.zeros ((n,1)) I= Np.zeros ((1, H))  forIinchRange (N): delta[i,0]= pi[0,i] * b[i,o[0,0]-1]             forHinchRange (1, H): forJinchRange (M): forIinchRange (N): delta_1[i,0]= delta[i,h-1] * A[I,J] * b[j,o[0,h]-1] Delta[j,h]=Np.amax (delta_1) psi[j,h]= Np.argmax (delta_1) +1Print("Delta matrix: \ n%r"%Delta)Print("psi Matrix: \ n%r"%Psi) P_best= Np.amax (delta[:,h-1]) PSI= Np.argmax (delta[:,h-1]) I[0,h-1] = psi + 1 forHinchRange (h-1,0,-1): I[0,h-1] = psi[i[0,h]-1, H]Print("optimal path probability: \ n%r"%p_best)Print("best path: \ n%r"% I)

Python implementation hmm (hidden Markov model)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.