Acoustic modeling of Speech Recognition Systems: Hidden Markov Model (HMM)

Source: Internet
Author: User

From: http://blog.1688.com/article/i25547966.html

[Guidance] the model of the speech recognition system is generally composed of two parts: the acoustic model and the language model, which correspond to the calculation of the speech-to-syllable probability and the calculation of the syllable-to-word probability. This article describes in detail the acoustic modeling of the speech recognition system based on the first-order Hidden Markov Model (HMM.

Hidden Markov Model(Hidden Markov Model, hmm) is a type of Markov chain. As a statistical analysis model, its state cannot be directly observed, but it can be observed through the observed vector sequence, each observed vector is represented in various States through certain probability density distributions, and each observed vector is produced by a sequence of States with corresponding probability density distributions. Therefore, the hidden Markov model is a double random process, that is, a hidden Markov chain with a certain number of States and a random function set.

HMM was founded in 1970s. It was popularized and developed in 1980s and has become an important direction of signal processing. It has been successfully used in speech recognition, behavior recognition, text recognition, fault diagnosis, and other fields.

For a speech recognition system, the output value is generally an acoustic feature calculated from each frame. There are two assumptions to use HMM to portray speech signals. One is that the internal state transfer is only related to the previous state, and the other is that the output value is only related to the current State (or the current state transfer, these two assumptions greatly reduce the complexity of the model.

Acoustic Modeling

In speech recognition systems, Hidden Markov models (HMM) are usually used to model recognition elements using a one-way, left-to-right, self-loop, and spanning topology, a phoneme is a three to five State hmm, and a word is a hmm formed by serial hmm consisting of multiple phoneme of a word, the entire model of continuous speech recognition is the HMM combining words and mute.

Context-related Modeling: Collaborative pronunciation refers to the change of a sound caused by the influence of the adjacent sound, in terms of the sound mechanism, the voice organ of a person can only change its properties when one voice changes to another, thus making the spectrum of the next voice different from that of other conditions. The context-related modeling method takes this impact into account during modeling, so that the model can describe speech more accurately. The impact of the previous sound is called Bi-phone, tri-phone is used to consider the influence of the former and the latter.

Context-related Modeling in English is usually based on phoneme. Because some phoneme have similar effects on phoneme, model parameters can be shared through the clustering of phoneme decoding states. The result of clustering is called senone. The decision tree is used to implement efficient triphone-to-senone correspondence. By answering a series of questions about the categories of front and back sounds (Yuan/consonants, clear/Voiced Sounds, etc, determine which senone should be used for its hmm status. The cart model of the classification regression tree is used to mark the pronunciation of words to phoneme.

Hmm expression

Hidden Markov Model (HMM) can be described using five elements, including two State sets and three probability matrices:

1. implicit state s

These States satisfy the Markov nature and are actually hidden states in the Markov model. These statuses are usually not obtained through direct observation. (For example, S1, S2, S3, etc)

2. observability o

Association with the implicit state in the model can be obtained through direct observation. (For example, O1, O2, O3, etc., the number of observability States may not be the same as the number of implied states .)

3. Initial state probability matrix π 

Indicates the probability matrix of the hidden state at the initial time T = 1. (for example, when T = 1, P (S1) = p1, P (S2) = P2, P (S3) = P3, then the initial state probability matrix π = [P1 P2 P3].

4. implicit state transfer probability matrix

Describes the transfer probability between States in the HMM model.

Where AIJ = P (SJ | Si), 1 ≤ I, and j ≤ n.

It indicates the probability that the state at t + 1 is SJ under the T moment and the state is si.

5. Observation state transfer probability matrix B

(The English name is confusion matrix, and the literal translation is a confusion matrix ).

If n represents the number of hidden states, and m represents the number of observed states, then:

BIJ = P (OI | SJ), 1 ≤ I ≤ m, 1 ≤ j ≤ n.

Indicates the probability that the observed state is oi at t time and the implied State is SJ.

[Conclusion] a hidden Markov model can be expressed in a concise manner using the λ = (a, B, π) triplet. The Hidden Markov Model is actually an extension of the standard Markov model. It adds a set of observed states and the probability relationship between these States and hidden states.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.