Acoustic modeling of Speech Recognition Systems: Hidden Markov Model (HMM)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From: http://blog.1688.com/article/i25547966.html

[Guidance] the model of the speech recognition system is generally composed of two parts: the acoustic model and the language model, which correspond to the calculation of the speech-to-syllable probability and the calculation of the syllable-to-word probability. This article describes in detail the acoustic modeling of the speech recognition system based on the first-order Hidden Markov Model (HMM.

Hidden Markov Model(Hidden Markov Model, hmm) is a type of Markov chain. As a statistical analysis model, its state cannot be directly observed, but it can be observed through the observed vector sequence, each observed vector is represented in various States through certain probability density distributions, and each observed vector is produced by a sequence of States with corresponding probability density distributions. Therefore, the hidden Markov model is a double random process, that is, a hidden Markov chain with a certain number of States and a random function set.

HMM was founded in 1970s. It was popularized and developed in 1980s and has become an important direction of signal processing. It has been successfully used in speech recognition, behavior recognition, text recognition, fault diagnosis, and other fields.

For a speech recognition system, the output value is generally an acoustic feature calculated from each frame. There are two assumptions to use HMM to portray speech signals. One is that the internal state transfer is only related to the previous state, and the other is that the output value is only related to the current State (or the current state transfer, these two assumptions greatly reduce the complexity of the model.

Acoustic Modeling

In speech recognition systems, Hidden Markov models (HMM) are usually used to model recognition elements using a one-way, left-to-right, self-loop, and spanning topology, a phoneme is a three to five State hmm, and a word is a hmm formed by serial hmm consisting of multiple phoneme of a word, the entire model of continuous speech recognition is the HMM combining words and mute.

Context-related Modeling: Collaborative pronunciation refers to the change of a sound caused by the influence of the adjacent sound, in terms of the sound mechanism, the voice organ of a person can only change its properties when one voice changes to another, thus making the spectrum of the next voice different from that of other conditions. The context-related modeling method takes this impact into account during modeling, so that the model can describe speech more accurately. The impact of the previous sound is called Bi-phone, tri-phone is used to consider the influence of the former and the latter.

Context-related Modeling in English is usually based on phoneme. Because some phoneme have similar effects on phoneme, model parameters can be shared through the clustering of phoneme decoding states. The result of clustering is called senone. The decision tree is used to implement efficient triphone-to-senone correspondence. By answering a series of questions about the categories of front and back sounds (Yuan/consonants, clear/Voiced Sounds, etc, determine which senone should be used for its hmm status. The cart model of the classification regression tree is used to mark the pronunciation of words to phoneme.

Hmm expression

Hidden Markov Model (HMM) can be described using five elements, including two State sets and three probability matrices:

1. implicit state s

These States satisfy the Markov nature and are actually hidden states in the Markov model. These statuses are usually not obtained through direct observation. (For example, S1, S2, S3, etc)

2. observability o

Association with the implicit state in the model can be obtained through direct observation. (For example, O1, O2, O3, etc., the number of observability States may not be the same as the number of implied states .)

3. Initial state probability matrix π　

Indicates the probability matrix of the hidden state at the initial time T = 1. (for example, when T = 1, P (S1) = p1, P (S2) = P2, P (S3) = P3, then the initial state probability matrix π = [P1 P2 P3].

4. implicit state transfer probability matrix

Describes the transfer probability between States in the HMM model.

Where AIJ = P (SJ | Si), 1 ≤ I, and j ≤ n.

It indicates the probability that the state at t + 1 is SJ under the T moment and the state is si.

5. Observation state transfer probability matrix B

(The English name is confusion matrix, and the literal translation is a confusion matrix ).

If n represents the number of hidden states, and m represents the number of observed states, then:

BIJ = P (OI | SJ), 1 ≤ I ≤ m, 1 ≤ j ≤ n.

Indicates the probability that the observed state is oi at t time and the implied State is SJ.

[Conclusion] a hidden Markov model can be expressed in a concise manner using the λ = (a, B, π) triplet. The Hidden Markov Model is actually an extension of the standard Markov model. It adds a set of observed states and the probability relationship between these States and hidden states.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Acoustic modeling of Speech Recognition Systems: Hidden Markov Model (HMM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Acoustic modeling of Speech Recognition Systems: Hidden Markov Model (HMM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support