1-gmm-hmms speech Recognition System-Framework Chapter

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This paper mainly introduces the traditional speech recognition system based on Gmm/hmms.

Outline: Recognition Principle Statistics Model system framework

First, it is necessary to explain that the object discussed in this article is Continuous speech recognition (continuous Speech recognition, CSR), which means the recognition of isolated words based on DTW(Dynamic time warping) (Isolated Word recognition) is not within the scope of the discussion (out-of-date). At the same time, the whole paper focuses on the automatic speech recognition decoding process (recognition process). 1. Principle of recognition

First understand that our voice is a sound wave, is an analog signal, generally stored in the computer as a WAV file (no compression format) or can be obtained directly from the microphone acquisition (online).

preprocessing and digitizing operations are required: filtering noise reduction, pre-emphasis (high frequency), endpoint detection, window framing, and the decomposition of our speech signal into many small segments of speech (voice frames). Generally, the length of each frame is 25ms, the adjacent two frames have 10ms overlap, that is, often said frame length 25ms, frame shift 10ms.
Then, we do the signal analysis for each frame, to further compress the data, also known as feature extraction , the common characteristic parameters are: MFCC,PLP. After feature extraction, each frame is compressed into 39-dimensional MFCC characteristic parameters by the original hundreds of record points. (It's a lot easier in a moment)

Next, is how to transform a series of characteristic parameters into a paragraph of the question. That is, the acoustic model (GMM-HMMS), the language model comes in handy. First we need to know that a word consists of a sequence of words consisting of a string of phonemes (phoneme) (such as bal:/b//ɔː//l/). Usually in English we choose to establish the hidden Markov model (the Chinese modeling unit is usually the phonology), that is, a phoneme corresponds to a hmm, and usually a hmm consists of three states (state). Okay, we're in the opposite, we have a characteristic parameter sequence, the process of recognition, is to solve how each characteristic parameter is identified as a state, and then from the state to the phoneme, phoneme to word, word to Word sequence (a word). The characteristic parameters to the state are solved by GMMs (mixed Gaussian model); three states to a phoneme, solved by hmm; phoneme to word, solved by dictionary; Word to Word sequence, solved by language model. Of course, throughout the process, we are all in a state network (time-state), all based on HMMs. This is also why it is said that HMMs solved the problem of speech recognition.

Statistical Models

The task of automatic speech recognition (Automatic Speech recognition, ASR) is to map a segment acoustic signal to a string of text. (Modeling is the first step in our actual solution to the problem)

W∗=ARGMAXWP (w| X) (1) w^*=\mathop{argmax}_{w}p (w| X) \tag{1}

Wherein X=xti=x1x2,⋯,xt,⋯,xt x=x_{i}^{t}=x_1x_2,\cdots,x_t,\cdots,x_t represents a length of T-t acoustic signal (voice frame), W=wni=w1w2,⋯,xn W=w_{i}^{n}=w _1w_2,\cdots,x_n represents a sequence of words with a length of n n (Word sequence), w∗w^* is the most likely sequence of words in all, that is, our recognition results.
However, the formula (1) is difficult to calculate directly (for generative models). We perform a Bayes transformation:

P (w| X) =p (x| W) P (w) p (X) ∝p (x| W) P (W) (2) p (w| X) =\frac{p (x| W) P (w)}{p (X)} \ \ \quad\quad\quad\quad\propto P (x| W) P (w) \tag{2}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

1-gmm-hmms speech Recognition System-Framework Chapter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

1-gmm-hmms speech Recognition System-Framework Chapter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support