Markov process
In probability and statistics, Markov processes (English: Markov process) are random processes with Markov properties, because the Russian mathematician Andre Markov is named. The Markov process is not a memory trait (memorylessness). In other words, the conditional probability of Markov process is only related to the current state of the system, and it is independent and irrelevant to its past history or future state.
A Markov process is a process in which the transitions between states depend only on the first n states. This process is called the N-order Markov model, where N is the (former) n state that affects the next State selection. The simplest Markov process is a first-order model whose state selection is only relevant to the previous state. It is important to note that it is not the same as a deterministic system, because the choice of the next state is determined by the corresponding probability and is not deterministic.
HorseThe Markov Chain, which describes a sequence of states, each of which depends on a finite number of previous states. Markov chains are random variables with Markov propertiesa sequence of numbers. The range of these variables, which is the set of all their possible values, known as the state space", while Xnthe value is in the state of time N. if xn=i, it is said that the process is in state I at n time, assuming that each time the process is in state I, the process is at the next
the probability of AIJ in the state J is a certain value. That is, for any n≥1,the conditional probability distribution of xn+1 for the past state is only a function of Xn.
Such stochastic processes are called Markov chains.
State Transition Matrix
For a first-order Markov model with N states, there is a total of n*n state transitions, since any state may be the next transition state of all States. Each state transition has a probability value, called the state transition probability, which is the probability of moving from one state to another state. All n*n probabilities can be represented by a state transition matrix.
The following matrix A is defined as the state transition probability matrix:
Meet the conditions:
1=<i,j<=n.
Note that these probabilities do not vary over time-a very important (but often unrealistic) hypothesis.
hmm of hidden Markov modelThe Hidden Markov model (Hidden Markov MODEL,HMM) is a statistical model used to describe a Markov process with hidden unknown parameters. The difficulty is to determine the implicit parameters of the procedure from observable parameters. These parameters are then used for further analysis, such as pattern recognition.
In the normal Markov model, the state is directly visible to the observer. The conversion probability of such a state is the full parameter. In the hidden Markov model, the state is not directly visible, but some variables affected by the state are visible. Each state has a probability distribution on the symbols that may be output. So the sequence of output symbols can reveal some information about the state sequence.
Hmm is a probabilistic model for describing the statistical characteristics of stochastic processes by parameters, which is a double stochastic process. Hmm consists of two parts: Markov chain and General stochastic process. The Markov chain is used to describe the transition of the State, which is described by the transfer probability. The general stochastic process is used to describe the relationship between the state and the observed sequence, which is described by the probability of observation. For HMM model, the state transition process is not observable, so called "hidden" Markov model.
An example of a hmmThere are n jars, each with a lot of colored balls, and the color of the ball is described by a set of probability distribution vectors. The experiment is carried out, according to an initial probability distribution, randomly select one of n cans, such as the first jar, and then according to the color of the color ball in the jar distribution, randomly select a ball, write down the ball color, recorded as O1, and then put the ball back in the jar, Also according to describe the transfer probability distribution of the can randomly select the next jar, such as the J jar, and then randomly select a ball from the jar, note the ball color, welcome to O2, has been carried on, you can get a description of the ball color sequence o1,o2,? Because this is a number of observed events, so called the observation sequence. But the transfer between the jars and the jars that were selected each time were hidden and could not be directly observed. and the color of the ball from each jar is not the same as the jar one by one, but is randomly determined by the probability distribution of the ball's color in the jar. In addition, each selection of a jar is determined by a set of transfer probabilities.
A hmm can be described by the following parameters:
1. N: The number of States of the Markov chain in the model. Remember N state is θ1,θ2,?,θn, remember T moment Markov chain is in the state of QT, obviously qt∈ (θ1,θ2,?,partθn). In the jar and the ball experiment, the jar is equivalent to the state of the Hmm.
2. M: The number of possible observations corresponding to each state. The observed value of M is V1,v2,?, VM, and T moment is OT, where ot∈ (v1,v2,? , VMS). The color of the selected ball in the jar and ball experiments is the observed value in the HMM model.
3. π: initial probability distribution vector π= (π1,π2,?,πn), where
Πi=p (qt=θi),1=<i<=n.
In experiments with jars and balls, the probability of a jar selected at the beginning of the experiment.
4. A: state transition probability matrix, a= (AIJ) n*n. which
Aij=p (qt+1=θj/qt=θi) 1≤i,j≤n
In experiments with jars and balls, the probability of selecting the next jar is selected each time the current jar is selected.
5. B: The observed probability matrix, b= (BJK) n*m, where
Bjk=p (ot=vk/qt=θj), 1≤j≤n,1≤k≤m
In the jar and the ball experiment, the BJK is the probability that the ball's color K appears in the J Jar.
In this way, the note hmm is:
Λ= (n,m,π, b)
or abbreviated as:
λ = (π, b)
Hmm can be divided into two parts, one is the Markov chain, which is described by π, a, and the resulting output is a state sequence: Another stochastic process, described by B, produces an output that is a sequence of observations.
hmm basic algorithm
Given a hmm, must solve three basic problems, around three basic problems, people have studied three basic algorithms, these three questions are:
Question 1 probability calculation of hmm
Given the observed sequence O={ol,o2,?,ot} and the model λ, how to effectively calculate the probability p (OIλ) of the observed variable sequence o under a given model
Problem 2 The optimal state sequence of hmm
Given the observed sequence O={o1,o2,?,ot) and the model λ, how to select a corresponding state sequence Q={q1, q2,?,qt), Be able to be optimal in a sense (for example, to better interpret "observational variables")?
Problem 3:HMM Training Problem (parameter estimation problem)
Given the observed sequence O={o1,o2,?,ot,) and initial conditions, how to adjust the model parameters λ= (π,a,b) to make P (Olλ) the largest?
forward-forward one-back algorithm
This algorithm is the solution to the first problem above.
Forward AlgorithmDefine the forward variable as:
Αt (i) =p (o1,o2,...,ot,qt=θiiλ), l≤t≤t
Well, there,
A) initialization
α1 (i) =πi6bi (O1)
b) Recursion
c) End
the back algorithm
Define the back variable as:
Βt (i) =p (ot+i,ot+2,...,ot|qt=θi,λ), 1≤t≤t-1
Wherein, βt (i) = 1, the calculation process of the forward algorithm is as follows:
A) initialization
Βt (i) =1,1≤i≤n
b) Recursion
c) End
The back algorithm is initialized for all States I define ΒT (i) = 1.
Viterbi AlgorithmQuestion 2 is to seek the "optimal" state sequence. There are many different meanings of ' optimal ', which can be concluded by different definitions. The optimal state sequence q* discussed here is to instruct P (q| O,λ) The state sequence as determined by the maximum. This process can be implemented using the Viterbi algorithm. The Viterbi algorithm can be described as follows:
Defines ΔT (i) as the T-moment along a path ql,q2,?,qt, and qt=θi, resulting in O1,o2,?, ot the maximum probability that there is
Then the process of finding the optimal state sequence q* is:
A) initialization
b) Recursion
C) End
where the symbol Argmax symbol is defined as if I=i,f (I) reaches the maximum value, then
D) seeking the best sequence of states
Baum-welch AlgorithmThis algorithm solves the above mentioned problem three, namely HMM training or parameter estimation problem, that is given an observation sequence O={o1,o2,?,ot}, the algorithm can determine a model λ= (π,a,b) so that P (Olλ) is the largest. This is a functional extremum problem, so there is no best solution to estimate λ. In this case, the Baum-welch algorithm uses the idea of recursion, so that P (Olλ) is the local largest and finally the parameters of the model are obtained.
When defining Ξt (I,J) for a given training sequence O and model λ , the moment Twhen the Markov chain is in θi state and momentthe probability that the t+1 is in the θj state, i.e.
Ξt (i,j) =p (o,qt=θi,qt+1=θj| λ)
Depending on the definition of the forward variable and the back variable, you can export:
Ξt (i,j) =[αt (i) AIJBJ (ot+1) βt+1 (j)]/p (Olλ)
Then the probability of the Markov chain in the θi state at T-moment is:
So Represents the number of expected values that are transferred from the θi state, and the number of expected values that are transferred from the θi state to the ΘJ state. This paper derives the well-known revaluation formula in the Baum-welch algorithm.
Then hmm parameter λ= (π,a,b) of the extraction process is; According to the observed sequence o and the initial model λ0= (π,a,b), a new set of parametric dew is obtained by the revaluation formula, which can be proved by a new model. That is, there is a revaluation formula is better than λ in the expression of the observation sequence O, then repeat the process, and gradually improve the model parameters until a certain convergence conditions, that is no longer significantly increase, at this time is the model to be asked,
hmm for pattern recognition
Hmm is a dynamic pattern recognition tool, which can model and classify information on a time span in a statistical way.
In speech recognition, to first establish a correspondence, for example, a word corresponding to a hmm, where the state refers to all the possible phonemes contained in the sound (or its subdivision, or its combination). corresponding to an observational sample of this word, these phonemes appear in a certain order, which forms the state sequence of Hmm, which is not observable in reality. The reality of the corresponding observation process is the amplitude of the sound signal corresponding to each letter. In order to establish the above correspondence, we should first
A set of observation samples of the word (several sound signals of the word) are studied, that is to say, the corresponding state sequence is missing in the case of hmm parameter estimation. In the language of mathematical statistics, the parameters of incomplete data are estimated.
After learning the parameters of each word, it can be used to identify. That is, a set of observation samples (a word of sound signal), to find the most likely to produce the observation sample of the model as a representative of the word.
The basic idea of the method-based isolated word recognition system: in the training stage, using HMM training algorithm (such as Baum-Welch algorithm), the system glossary of each word corresponding to the Hmm, recorded as λ. In the recognition stage, each probability p (o/λi) is calculated using a forward-to-forward or Viterbi algorithm, where O is the sequence of observed values for the word to be recognized. Finally, the maximum P (o/λi) is selected and the corresponding Word bin is the recognition result of O.
Hmm hidden Markov model